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What is the minimal size quantum circuit required to exactly implement a specified n-qubit 
unitary operation, U, without the use of ancilla qubits? We show that a lower bound on the 
minimal size is provided by the length of the minimal geodesic between U and the identity, I, where 
length is defined by a suitable Finsler metric on the manifold SU(2 n ). The geodesic curves on 
these manifolds have the striking property that once an initial position and velocity are set, the 
remainder of the geodesic is completely determined by a second order differential equation known 
as the geodesic equation. This is in contrast with the usual case in circuit design, either classical 
or quantum, where being given part of an optimal circuit does not obviously assist in the design of 
the rest of the circuit. Geodesic analysis thus offers a potentially powerful approach to the problem 
of proving quantum circuit lower bounds. In this paper we construct several Finsler metrics whose 
minimal length geodesies provide lower bounds on quantum circuit size. For each Finsler metric we 
give a procedure to compute the corresponding geodesic equation. We also construct a large class 
of solutions to the geodesic equation, which we call Pauli geodesies, since they arise from isometries 
generated by the Pauli group. For any unitary U diagonal in the computational basis, we show 
that: (a) provided the minimal length geodesic is unique, it must be a Pauli geodesic; (b) finding 
the length of the minimal Pauli geodesic passing from I to U is equivalent to solving an exponential 
size instance of the closest vector in a lattice problem (CVP); and (c) all but a doubly exponentially 
small fraction of such unitaries have minimal Pauli geodesies of exponential length. 

PACS numbers: 03.67.Lx,02.30.Yy 



I. INTRODUCTION 
A. Overview 

A central problem of quantum computation is to de- 
termine the most efficient way of implementing a desired 
unitary operation. Although insight into this problem 
has been obtained for certain specific unitary operations, 
no useful general techniques for determining the most 
efficient implementation are known. 

The interest in this problem arises from the desire 
to find classes of unitary operations which can be im- 
plemented efficiently, i.e., using polynomial resources. 
Using a non-constructive measure-theoretic argument, 
Knill has shown that a generic unitary operation re- 
quires exponentially many quantum gates even to ap- 
proximate. Despite this result, no explicit construction 
of a natural family of unitary operations requiring expo- 
nential size quantum circuits is known. 

An analogous situation holds classically, where Shan- 
non [3 (see Theorem 4.3 on page 82 of 0) used a 
non-constructive counting argument to show that most 
Boolean functions / : {0,1}" — » {0,1} require circuits 
of exponential size to compute. Despite this result, no 
explicit construction of a natural family of functions re- 
quiring exponential size circuits is known. 

The lack of explicit constructions of hard-to-computc 
operations is symptomatic of the general difficulty en- 



countered in proving lower bounds on the computational 
resources required to synthesize specified classes of oper- 
ations, both quantum and classical. The most celebrated 
instance of this difficulty is, of course, the problem of 
proving P NP. More generally, computer scientists 
suspect many separations between computational com- 
plexity classes, but techniques to prove them are elusive. 

The problem motivating the present paper is inspired 
by the problems just described, but is more restricted 
in scope. Suppose U is a special unitary operation on n 
qubits, i.e., a 2™ x 2™ unitary operation with unit determi- 
nant 1 . Let Q be a set of unitary gates which is universal 
on n qubits, e.g., the set of single-qubit unitary opera- 
tions and any fixed entangling two-qubit gate 0, 0, El • 
We require Q to be exactly universal, i.e., the group gen- 
erated by Q should be SU(2 n ), not some dense subset. 
Then we define mg(U) to be the minimal number of gates 
from Q required to exactly synthesize U. 

In this paper we explain how to introduce a metric 
d(; •) on SU(2 n ) such that d(I, U) < m g {U), where I is 
the n-qubit identity operation. Thus the metric d pro- 
vides a lower bound on the number of gates required to 
implement U. We define our metric by first specifying a 
structure known as a local metric, which can be thought 
of as assigning a distance to points nearby on the mani- 
fold. This local metric induces a natural notion of curve 
length, which can then be used to define d(-,-) as the 
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1 In this paper we work mainly with SU(2 n ), rather than U(2 n ). 
As a consequence, all unitaries are assumed to have unit deter- 
minant, unless otherwise remarked. 



2 



infimum over lengths of curves between two points. 



B. Motivating ideas 

Two key ideas motivate our geometric approach to the 
problem of proving lower bounds on mg(U). The first 
idea is that it is easier to minimize a smooth function 
on a smooth space rather than a general function on a 
discrete space, and thus whenever possible we replace 
discrete structures by smooth structures 2 . This is, of 
course, an idea familiar to any undergraduate: it is far 
easier to minimize a smooth function defined on the re- 
als than it is to minimize that function when restricted 
to the integers. The reason, of course, is that the differ- 
ential calculus enables us to minimize smooth functions 
on smooth spaces, using the powerful principle that such 
a function /(•) should be stationary at a minimum, and 
thus satisfy the equation f'(x) = 0. 

This observation motivates us to reformulate the prob- 
lem of finding a minimal quantum circuit — essentially, 
an optimization over the discrete space of possible quan- 
tum circuits 3 — with a closely related problem, which we 
call the Hamiltonian control problem. In the Hamiltonian 
control problem we attempt to minimize a smooth cost 
function with respect to a smooth set of Hamiltonian 
control functions. This formulation allows us to apply 
the principle that a smooth functional is stationary at 
its minimum. Technically, we carry this out by using the 
calculus of variations to study the minimal cost Hamil- 
tonian control function. As we describe below, this idea 
leads naturally to a geodesic equation whose solutions 
are control functions which are local minima of the cost 
function. 

A second key idea, overlapping the first, motivates our 
approach. This idea is built on an analogy to the many 
principles of physics which can be formulated in one of 
two equivalent ways: a description in which the motion 
of a particle is described in terms of a local force law, 
and a description in which motion is described in terms 
of minimizing some globally defined functional. To make 
this analogy precise we use the example of test particle 
motion in general relativity, but similar remarks may be 
made in many other areas of physics, including classical 
mechanics (Newtonian versus Lagrangian formulations), 
and optics (geometric optics versus Fermat's principle). 

It is a basic principle of general relativity that test par- 
ticles move along geodesies of spacetime, i.e., move so as 
to minimize a globally defined functional, the pseudo- 
Riemannian distance. This principle turns out to be 



2 In a related vein, Linial has written a stimulating survey de- 
voted to the use of geometric ideas in combinatorics. 

3 Of course, some (though not all) universal gate sets have a 
smooth structure. By "discrete" we mean here that the applica- 
tion of a quantum gate is a discrete event, and so the number of 
quantum gates applied is necessarily a non-negative integer. 



equivalent to the particle following the geodesic equation, 
g? 2 ^ „■ dx k dx l , . 

where x J are co-ordinates for the position on the man- 
ifold, we sum over the repeated indices k and /, and 
the Christoffel symbols T J kl are real numbers determined 
by the local geometry of spacetime. This reformulation 
shows that minimizing the length traversed is equivalent 
to following what is essentially a local force law: the 
geodesic equation tells us how a particle ought to ac- 
celerate, given its current velocity and the local geom- 
etry. More generally, on any Riemannian or pseudo- 
Riemannian manifold the problem of geodesic motion 
turns out to be equivalent to following a local force law, 
the same geodesic equation of Equation Q . 

This situation is in sharp contrast to the problem of 
finding an optimal circuit to compute a function. Sup- 
pose someone gives us a partially complete circuit to com- 
pute a function, /, and asks us to complete the circuit. 
In general, there are no useful techniques for determining 
the best way of completing the circuit, short of an ex- 
haustive search. But given an arbitrarily small arc along 
a geodesic on a Riemannian manifold, the remainder of 
the geodesic is completely determined by the geodesic 
equation. Indeed, provided we know the velocity at some 
given point the remainder of the geodesic is completely 
determined by the geodesic equation. 

This analogy motivates our formulating the Hamilto- 
nian control problem so that the cost function to be min- 
imized is a local metric structure on a suitable type of 
manifold, which we shall argue below is a Finsler man- 
ifold, a type of manifold generalizing the Riemannian 
manifolds most familiar to physicists. Just as for Rie- 
mannian manifolds, we will see that the geodesies on a 
Finsler manifold are determined by a geodesic equation 
of the form of Equation QJ, but where the coefficients 
are a generalized type of Christoffel symbol for the 
Finsler manifold. Thus, once the initial position and ve- 
locity (or any small arc) are known, the remainder of the 
geodesic is uniquely determined by the geodesic equation. 

An important caveat to this otherwise encouraging sit- 
uation is that while the solutions to the geodesic equation 
are local minima of the cost function, they may not be 
global minima 4 . That is, there may be multiple geodesies 
connecting I and U, and we will see some explicit exam- 
ples of this later. Nonetheless, the minimal length curve 
is guaranteed to be a geodesic 5 , and thus geodesic analy- 
sis offers a potentially powerful approach to proving lower 
bounds on mg(U). 



4 The analogous situation in ordinary calculus is, of course, the 
fact that f'(x) = may have multiple solutions. 

5 For this reason, we use the terms "minimal length curve" and 
"minimal length geodesic" interchangeably. 



3 



C. Structure and main results 

The local metric is a right-invariant (/-bounding 
Finsler metric: As just described, our strategy is to 
define and study suitable classes of local metrics on the 
manifold SU(2 n ). What properties should these local 
metrics have? In order to find a suitable local metric our 
strategy is to sequentially impose more and more restric- 
tive conditions, motivated by various properties that we 
desire the minimal length curves to have. 

We begin in SectionllTlfSubsections llL^l and llTBl) . with 
a simple general argument motivating the use of local 
metrics as a measure of the cost of implementing a uni- 
tary operation. In Section ITT1 fSubsection lHB")l a simple 
extension of the same argument is used to motivate the 
condition that the local metric be right-invariant, which 
corresponds to the physical requirement that the cost of 
applying a particular Hamiltonian should not depend on 
when that Hamiltonian is applied, i.e., it is an expression 
of homogeneity. 

In Section [H] (Subsection III C|) we impose the addi- 
tional requirement that the local metric should be capa- 
ble of providing lower bounds on gate complexity, i.e., 
dp(I,U) < mg(U), where dp(I,U) is the distance be- 
tween I and U induced by the local metric, which we 
denote by F. In particular, we prove a simple theorem, 
giving sufficient conditions on F in order that the in- 
equality dp (/,£/") < mg(U) hold. We call local metrics 
satisfying this condition Q -bounding. 

Finally, in Section ITT1 fSubsection 111 U|) we impose the 
additional requirement that the local metric have suffi- 
cient smoothness and convexity properties to allow the 
calculus of variations to be applied to study the minimal 
length curves. We will show that this is equivalent to 
requiring that the local metric be a Finsler metric. 

Finsler metrics are a class of local metrics generalizing 
the Riemannian metrics familiar to physicists from the 
study of general relativity. In Riemannian geometry the 
length of a small displacement on the manifold is deter- 
mined by the square root of some quadratic form in the 
displacement. On a Finsler manifold, this special form for 
the local metric is replaced by a general norm function, 
subject only to the most general smoothness and con- 
vexity properties sufficient to ensure that a second order 
differential equation holds for the geodesies. Essentially, 
Finsler metrics may be viewed as the most general class 
of local metrics giving rise to such a geodesic equation. 

Summing up, the main result of Section [H] (Subsec- 
tions III Al through III D|) is that the most suitable local 
metric structure is a right-invariant (^-bounding Finsler 
metric. 

Construction of local metrics providing lower 
bounds on mg(U): In Section ITU fSubsection lHE|l wc 
construct three important families of right-invariant Q- 
bounding local metrics, which we denote F%,F P and F q . 
As each of these local metrics is ^-bounding, they all 
give rise to lower bounds on mg(U), through the results 
of Subsection III CI However, Fi and F p lack some of the 



smoothness and convexity properties required by Finsler 
metrics. In order to analyse F\ and F p within the desired 
framework of Finsler geometry, in Appendix ^ we con- 
struct a parameterized family of right-invariant Finsler 
metrics Fja and F p a, with the property that Fia — * Fi 
and F p A —> F p as A — > 0. That is, we can approxi- 
mate Fi and F p as well as desired using suitable families 
of right-invariant Finsler metrics, and thus study them 
using the geodesic equation of Finsler geometry. 

Computing the geodesic equation: In Section ITTTl 
we explain how to compute the geodesic equation for 
each of our families of right-invariant Finsler metrics 
FiA,FpA, and F q . The main tool used in the computa- 
tion of the geodesic equation is a generalization of the 
Baker-Campbell-Hausdorff formula used by physicists, 
which is used to accomplish a necessary change of co- 
ordinates on SU(2 n ). As a simple illustration of the 
utility of the geodesic equation, in Section II I II (Subsec- 
tion U^mj we consider the effect ancilla qubits have on 
the minimal length curves, showing that for Finsler met- 
rics satisfying suitable conditions — F q is an example of 
such a Finsler metric — there is a neighbourhood of the 
identity in which the presence or absence of ancilla qubits 
does not affect the minimal length curves. 

Construction of the Pauli geodesies: In Sec- 
tion llVI fSubsections lIV Al and llV B|) we construct a class 
of curves in SU(2 n ) which are geodesies for all three of 
the Finsler metrics Fia- F p \, and F q . We call these Pauli 
geodesies, as they arise naturally from a class of isomc- 
tries associated with the Pauli group. 

Finding minimal length Pauli geodesies is 
equivalent to solving an instance of closest vec- 
tor in a lattice: In general, many geodesies may con- 
nect any two points on SU(2 n ), and the problem of find- 
ing the minimal length curve connecting two points may 
be viewed as the problem of finding the minimal length 
geodesic. We show in Section lTVI fSubsection II V "C^i that 
the problem of finding the minimal length Pauli geodesic 
through a unitary U which is diagonal in the computa- 
tional basis is equivalent to solving an exponential size 
instance of the closest vector in a lattice problem (CVP), 
well known from computer science. 

This reduction to CVP is perhaps somewhat ironic, 
given our general philosophy of replacing discrete struc- 
tures by smooth structures. However, we will see that 
the CVP instance is far simpler and has a much more 
elegant structure than the original problem of finding a 
minimal-size quantum circuit. 

The reduction to CVP is only for the problem of find- 
ing the minimal length Pauli geodesic, not the minimal 
geodesic of any type. Also in Subsection IIV CI we show 
that provided there is a unique minimal length geodesic 
through U, then the minimal length geodesic is a Pauli 
geodesic, and thus the solution of the CVP instance will 
give the distance dp{I, U), and so provide a lower bound 
on mg(U). Unfortunately, we are not able to say how 
generically this situation holds. Standard examples of 
Riemannian manifolds such as the sphere, flat space and 
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the hyperbolic spaces suggest it may be true of many or 
perhaps most U. Against this, in Subscction llV El we give 
an example where the minimal Pauli geodesic is provably 
not the minimal length geodesic of any type. 

Minimal length Pauli geodesies of exponential 
length exist: Using the connection to CVP, in Sec- 
tion IIVI (in Subsection HVDfl we prove that the over- 
whelming majority of unitaries diagonal in the compu- 
tational basis have minimal length Pauli geodesies of ex- 
ponential length. The method of proof is a volume argu- 
ment, suggested to the author by Oded Regev. 

Caveats: Several additional caveats to our results 
should be made clear. 

The first caveat is that although we have made consid- 
erable progress understanding the geodesic structure of 
our local metrics, we are a long way from a complete un- 
derstanding of cither the geodesies or the minimal length 
curves of those local metrics. As an example of the type 
of basic question that is still unresolved, we do not even 
know for sure if minimal curves of exponential length ex- 
ist. 

These difficulties are perhaps not surprising, as in gen- 
eral it is an extremely difficult problem to understand the 
minimal length geodesies on a manifold. Indeed, there 
are few manifolds even of Riemannian type for which the 
geodesies are completely understood. This paper should 
thus be viewed as a first step toward an understanding 
of these minimal length geodesies. 

The second caveat is that although we show that 
cIf(I, U) < mg(U) for right-invariant ^/-bounding Finsler 
metrics, F, it is by no means clear how to choose F 
in such a way as to achieve the desirable property that 
mg(U) and dp (/,[/) be polynomially equivalent. I con- 
jecture that all three of -Fia, F p a, and F q have this prop- 
erty, for suitable parameter choices. The one concrete 
step in this direction we take is to show that as A — * 
the distance cLf 1& (I,U) can be interpreted as the mini- 
mal time required to generate U , using a set of control 
Hamiltonians each of which can be efficiently simulated 
in the standard quantum circuit model. 

The third caveat is a reiteration and extension of the 
earlier point that while understanding the behaviour of 
mg(U) would be extremely interesting in its own right, 
it is really a toy version of some much more interesting 
problems in quantum computational complexity. There 
are three important ways the determination of mg(U) 
falls short of the problems of interest in quantum com- 
putational complexity: (1) the requirement that the syn- 
thesis of U be exact, rather than approximate; (2) the 
requirement that this synthesis be performed without 
the benefit of additional workspace, i.e., without ancilla 
qubits initially prepared in a standard state; and (3) the 
lack of a uniformity requirement on the circuit imple- 
menting U, i.e., there is no requirement that there be a 
polynomial-time Turing machine efficiently generating a 
description of the circuit. Obviously it is to be hoped 
that these shortcomings can be mitigated by future ex- 
tensions of the present approach. 



The fourth caveat is in relation to the presentation of 
the paper. The paper is intended to be accessible to 
physicists, mathematicians, and computer scientists, es- 
pecially those involved in quantum information science, 
but makes use of ideas from certain areas of mathematics 
- differential geometry, Finsler geometry, and the vec- 
torization of matrix equations — that may be unfamiliar 
to many readers. As the goal of the paper is primarily 
to synthesize a program for investigating quantum lower 
bounds, rather than to solve specific technical problems 
previously considered inaccessible, I have included con- 
siderable introductory material and references, as well 
as describing some arguments in considerable detail, in 
order to make the broad picture as clear as possible. 



D. Prior work 

We divide prior work up into research from four differ- 
ent points of view: optimal quantum control, universality 
constructions for quantum circuits, quantum circuit de- 
sign, and computational complexity theory. 

Optimal quantum control. The work most similar 
in spirit to the present paper comes from the field of 
quantum control, particularly optimal quantum control. 
Quantum control is a large field, and we will not attempt 
to comprehensively survey it here — see, e.g., 0,0,0, 
ll l| for an entry into the literature, and further references. 

Only a relatively small part of the quantum control 
literature has been concerned with time-optimal methods 
for generating unitary operations. These methods may 
be subdivided into two (overlapping) approaches: those 
based on geometric control theory, and those based on 
using the calculus of variations to minimize some global 
cost functional without a direct geometric interpretation. 

The research most closely related to the present pa- 
per is the work on geometric quantum control pursued 
by Khaneja, Brockett and Glaser ( see a l so E3>E|)> 
by Zhang and Whaley and by Dirr et al |16|. We 
now briefly outline the approach taken in this prior work, 
in order to contrast it with the approach taken in the 
present paper. 

These prior works formulate the problem of quantum 
control as the problem of synthesizing a unitary op- 
eration U using a time-dependent control Hamiltonian 
H = Hd + ^2jVjHj, where Hj are control Hamiltoni- 
ans, Vj are real control functions, and Hd is the drift 
Hamiltonian. The goal is to synthesize U in the minimal 
possible time. It is assumed that the control functions 
Vj can be made arbitrarily intense, for no cost, but the 
Lie group K which they generate is a strict subgroup of 
the total Lie group SU(2 n ). K might be, for example, 
the space of local unitary operations on n qubits, while 
Hd is some global entangling Hamiltonian connecting all 
the qubits 17]. Thus, the time taken to synthesize U is 
just the total time for which the drift Hamiltonian Hd is 
applied. 

Khaneja et al [l2] | show that this problem is equiva- 
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lent to finding the minimal length geodesies on the coset 
space SU(2 n )/K, when that space is equipped with a 
suitable metric structure. Furthermore, they show that 
in the special case when SU(2 n )/K is a Riemannian 
symmetric space these geodesies have an exceptionally 
simply structure that enable the minimal time to be 
calculated exactly. This is the case, for example, for 
SU(A)/SU{2) ®SU{2). 

The power of this approach comes from the connection 
to the theory of symmetric spaces, which have a beau- 
tiful theory that is exceptionally well understood (see, 
e.g., ^3)- This is also its limitation, for it is only in 
very special cases that SU{2 n )/K is a symmetric space. 
See, e.g., [H for a discussion of the limitations on 
K imposed by this requirement. In practice, Khaneja et 
al, Zhang and Whaley, and Dirr et al were all limited 
to studying geodesies for special cases where n < 3. (We 
note that some closely related results for n = 2 have been 
obtained in |19t 1 2(1 l2lj , using a different approach based 
on the theory of majorization.) 

Our approach is similar in spirit to this prior work, 
but differs significantly in substance. We do not iden- 
tify any special subgroup K, and thus work directly 
with the space SU(2 n ), using the general framework of 
Finsler geometry, rather than the Riemannian geometry 
of SU(2 n )/K. In this framework, we relate the length of 
the minimal Finsler geodesic to the minimal size quan- 
tum circuit. As our interest is motivated by quantum 
computation, we are primarily interested in the case of 
an arbitrary number of qubits, n, and we succeed in con- 
structing geodesies valid for any n, and obtaining some 
general (albeit, limited) results about the minimal length 
geodesies for arbitrary n. 

Optimal control theory has also given rise to a sec- 
ond strand of work related to the present paper, with a 
rather more extensive literature than the quantum geo- 
metric control literature. Rather than reviewing all this 
literature in detail, we refer the reader to a recent sam- 
ple Hil Hi! (MS, and the references therein. 

Broadly speaking, the typical setting for this work 
is the problem of finding the optimal way of generat- 
ing a one- or two-qubit quantum gate, using a specified 
Hamiltonian (e.g., a two-level atom coupled to an exter- 
nal electromagnetic field) containing one or more con- 
trol parameters. An ad hoc functional is constructed, 
representing the cost of generating the gate in terms of 
quantities such as the power consumed. The calculus 
of variations is then employed to derive a condition for 
that functional to be maximized, typically resulting in a 
two-point boundary value problem for some second order 
differential equation, which is then solved numerically us- 
ing iterative techniques. This body of work is thus much 
more concerned with obtaining numerical results for spe- 
cific Hamiltonians and specific one- and two-body uni- 
taries, rather than the general n-qubit questions of most 
interest to us. 

Universality constructions for quantum cir- 
cuits. Researchers working on universality constructions 



for quantum circuits have done considerable work opti- 
mizing their constructions. This began in the early pa- 
pers by Barenco et al and KnillQ , who showed that 
the universality constructions in |28| are near-optimal 
for a generic unitary operation. This work has subse- 
quently been improved by many groups; see, for exam- 
ple, Ha EES Hi, 

and references therein. This line of 
investigation appears superficially to be closely related 
to the topic of the present paper, but that appearance 
is misleading. The reason is that this prior work inves- 
tigates constructions which are only generically optimal, 
and there is no reason to believe that the constructions 
obtained in any of these papers will be optimal for any 
particular unitary operation 6 , and thus they cannot be 
used to deduce lower bounds on the minimal number of 
circuit elements required to synthesize a specific unitary 
operation. 

Optimal quantum circuit design. Another topic 
which has attracted considerable prior interest is the de- 
sign of optimal quantum circuits for specific tasks. This 
has become a major topic of ongoing investigation; un- 
fortunately no general survey exists, and a list of refer- 
ences would run to many hundreds. However, the key 
point is that these papers derive optimal or near-optimal 
circuits only for certain special classes of unitary opera- 
tions, e.g., Cleve and Watrous' |33 fast parallel circuits 
for the quantum Fourier transform. Thus this work does 
not provide a general approach to the problem of finding 
optimal circuits for unitary operations, nor for the prob- 
lem of finding lower bounds on the number of quantum 
gates required to perform a given (but arbitrary) unitary 
operation. 

Computational complexity. The theory of com- 
putational complexity exists in large part, of course, to 
analyse the time cost of computation, in both classi- 
cal and quantum computing models. General references 
are HQ. 

Within quantum computational complexity, the work 
of most relevance to the present paper is the work on or- 
acle lower bounds, a selection of which may be found 
in (3^, |3(| |3?|]; see also the references therein. The 
oracle setting offers substantial technical simplifications 
when compared with the problem of proving uncondi- 
tional lower bounds on the difficulty of synthesizing uni- 
tary operations, but it is also widely regarded as a much 
less interesting setting. The results in the present paper 
are much less complete than some of the results obtained 
in the oracle setting, but have the advantage of being in 
the unconditional setting. 

Within classical computational complexity, it is worth 
noting a surface resemblance between the present work 
and the approach to the P ^ NP problem due to Mul- 



A notable exception is that the algorithm in l29l does repr oduce 
the fast quantum Fourier transform circuit. However, |2fl notes 
that the construction in that paper is not optimal in general. 
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muley and Sahoni HI HIES (see also H2). This work 
also uses geometric techniques to address the problem of 
proving lower bounds. However, the techniques used are 
from algebraic geometry, based on geometric invariant 
theory, and thus are not obviously related to the ideas 
used in the present paper, which are based on Ricmann 
and Finsler geometry. 



II. THE HAMILTONIAN CONTROL PROBLEM 
AND METRICS ON MANIFOLDS 

In this section we introduce the Hamiltonian control 
problem, whose goal is to find a time-dependent Hamil- 
tonian H(t) synthesizing U. For a given Hamiltonian we 
then define a corresponding cost, which is a functional of 
H(-). We argue on general grounds that the cost func- 
tion ought to arise from a right-invariant Finsler metric 
on the manifold SU(2 n ), and provide general conditions 
in order that such a cost function provide a lower bound 
on mg(U). Furthermore, we introduce several Finsler 
metrics satisfying these conditions, and discuss how ef- 
fective each of these Finsler metrics is likely to be as a 
means of proving quantum circuit lower bounds. 



A. The Hamiltonian control problem 

Let U be a special unitary operation on n qubits. Our 
goal is to synthesize U using a traceless 7 control Hamilto- 
nian H(t). It is convenient to expand the control Hamil- 
tonian in terms of the generalized Pauli matrices, which 
we take to be the set of n-fold tensor products of the 
single-qubit Pauli matrices, omitting 7®". The resulting 
expansion is: 

H(t)=^r(t)cT. (2) 

a 

Note that we always omit the term a = I®" from such 
sums. The functions 7 <T (t) are known as control func- 
tions; the (4" — l)-dimensional vector ^(t) whose entries 
are the individual control functions 7 <J (£) is known as 
the control function. Sometimes it is convenient to omit 
the t and just write 7 to denote the entire vector-valued 
control function 7 (t). 

In order that U be correctly synthesized, Schrodinger's 
equation requires that the control function ~f(t) satisfies 
the equations: 

^ = -iH(t)V; V{0)=I; V(1) = U. (3) 



7 A priori there is no need for the Hamiltonian to be traceless. 
However, by adding a suitable multiple of the identity we can 
always make the Hamiltonian traceless, and so there is no loss of 
generality in making this assumption. 



We have chosen t = as the initial time, and t = 1 as the 
time at which we desire the evolution to reach U . These 
choices are arbitrary, and it is not difficult to prove that 
the definition given below of the cost of synthesizing U 
does not depend on the values chosen for these times. 

It is helpful to assume that 7(<) is a smooth (i.e., C°°) 
function of t. We say that a smooth control function 7 
satisfying Equations @ and © is a valid control func- 
tion generating U. 

B. Cost functions and right-invariant local metrics 
on manifolds 

Our goal is to determine the most efficient way of gen- 
crating U. To make the notion of efficiency precise, we in- 
troduce the cost c/( 7 ) associated to a valid control func- 
tion 7, 

C/ ( 7 )= / dt/( 7 (t)), (4) 
Jo 

where / : M 4 " -1 — > K is a real- valued function of the 
control function ~f(t). We study below the properties 
which / ought to have if cf("f) is to be a good measure 
of efficiency. 

We now define the cost Cf(U) of the unitary U as the 
infimum of the cost 0/(7) over all valid control functions 
7 generating U , 

c f (U) = Mc f ( 7 ). (5) 

7 

Note that we will refer to all three of /, 0/(7) and Cf(U) 
as the cost function, depending on context. 

The remainder of this subsection is devoted to argu- 
ing that the cost function / is equivalent to a geomet- 
ric object known as a right-invariant local metric on the 
manifold SU(2 n ). 

To make this argument, in III B ll we begin by noting a 
few properties that / ought to have, if c/( 7 ) and Cf(U) 
are to be good measures of cost. The purpose here is 
simply to motivate the list of properties we will demand 
of /, and so the discussion focuses on heuristic arguments 
and intuition building, rather than on rigorous proofs. 

Our list of desired properties for / in hand, in III B 21 
we move to the framework of differential geometry, and 
show that with these properties, / corresponds to a right- 
invariant local metric on SU (2™). The advantage of mov- 
ing to this geometric viewpoint is that it allows the well- 
developed tools and viewpoint of geometry to be applied. 

1. Desired properties of the cost function f 

Continuity: A background assumption useful in our 
later arguments is that / be continuous. Obviously this 
is reasonable on physical grounds. 

Positivity: Given the interpretation of Cf(U) as the 
cost of synthesizing U, we expect that c/(£7) > 0, with 
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equality if and only if U = I, the identity operation. 
Using the continuity of /, it is straightforward to see 
that this is equivalent to the condition f(y) > 0, with 
equality if and only if y = 0. 

Positive homogeneity: Physically if we double the 
intensity of the Hamiltonian for a while, but halve the 
time it is applied, we wouldn't expect the cost to change, 
as the total effort required is the same. Mathematically 
this idea may be expressed by the requirement that / 
be positively homogeneous, i.e., that f(ay) = a/(y), for 
any positive real number a, and any vector y. 

Achievement of the infimum: Another useful back- 
ground assumption is that the infimum in the definition 
of cj(U) is achieved by some valid control function 7. 
This is not strictly necessary for the arguments we make 
below, but it does streamline them. 

The triangle inequality: We will argue that / ought 
to satisfy the triangle inequality, f(x + y) < f(x) + f(y). 
Suppose 7 is the control function which minimizes Cf(U). 
Fix t, and suppose there exist x and y such that 7(f) = 
x + y and f(x + y) > f(x) + f{y). Choose a value A > 
sufficiently small that f("f(s)) is effectively constant 
over the interval s £ [t,t + A]. We construct a new 
modified control function 71 which takes the value 2x 
on the interval [t,t + A/2], the value 2y on the interval 
[t + A/2, t + A], and otherwise takes the same values as 
7. Note that this function is not valid, since it is neither 
smooth nor does it exactly satisfy Equations J2J and ||3J| . 
However, it is easy to regularize 71 to produce a control 
function 72 that is valid, and has essentially the same 
cost. It follows that 0/(72) = 0/(71) < 0/(7), which 
contradicts the presumed minimality of 7. This suggests 
that / should satisfy the triangle inequality f(x + y) < 
f(x) + f(y) for all x and y. 

It is worth noting the additional point that if we de- 
sire our minimal curves to be unique then the triangle 
inequality needs to be strict, i.e., f{x + y) < f(x) + f(y), 
with equality if and only if x and y are along the same ray 
emanating from the origin. If the strict triangle inequal- 
ity is not satisfied, then 7 could be modified to produce a 
different valid control function 72 with the same cost, us- 
ing an argument similar to that above. This observation 
will be useful later, in our discussion of Finsler geometry. 

Summarizing, we have argued that the cost function 
ought to satisfy the conditions: f(y) > with equality 
iff y — 0; / is positively homogeneous, i.e., f(ay) — 
af(y) for all positive a; and the triangle inequality f(x + 
v) < f{x) + f{y)- We now show that these conditions 
imply a correspondence between / and right-invariant 
local metrics on SU(2 n ). 



2. Geometric reformulation of the cost function 

We assume the reader is familiar with the concepts of 
elementary differential geometry, and merely review the 
necessary notation and nomenclature. The reader unfa- 
miliar with any of these concepts is advised to consult 



an introductory text such as Isham |42j or Lee |43|. It 
will also help to have some familiarity with Ricmannian 
geometry — also covered, albeit rather more briefly, in 
those texts — but this is not as essential. 

We denote a smooth (i.e., C°°) n-dimensional manifold 
by M; we typically omit "smooth" and just refer to M as 
a manifold. We will often denote points on M by x, and 
local co-ordinate systems by <fi : S — > R n , where S is an 
open subset of M, and <j> is a homeomorphism of S into 
a subset of K™. The tangent space to M at point x is 
denoted T X M , and we often use y to denote a vector in a 
tangent space such as T X M. We write (x, y) to denote an 
element of the tangent bundle TM. We will have much 
interest in C°° maps / : M — > N between manifolds M 
and N, where by C°° we mean that all the derivatives of 
the map / exist and are continuous with respect to any 
local co-ordinate systems on M and N. We will use the 
term C°° interchangeably with the term "smooth". A 
curve on M is a smooth map s : I — ► M, where / is an 
interval in K. Given such a smooth map / : M — ► N, and 
fixing x £ M, we use /* : T X M — > Tft x \N to denote the 
natural pushforward map connecting the tangent spaces 
T X M and T f(x) N. 

We define a manifold with local metric as a manifold M 
equipped with a function F : TM — * [0, 00) such that for 
each fixed x, the function F(x,y) satisfies: F(x,y) > 
with equality iff y = 0; F (x, y) is positively homogeneous 
in y; and F(x, y) satisfies the triangle inequality in the 
second variable. F is called the local metric. 

So far as I am aware the term "local metric" is not a 
standard term in geometry, however in this paper we'll 
find it a useful unifying term that can be specialized to 
give the standard concepts of a Finsler or Riemannian 
metric. 

We define the length of a curve s : I — > M on a mani- 
fold M with local metric F by 

l F (s) = J dtF(s(t),[s] t ), (6) 

where [s] t £ T S ^M is the tangent to s at s(t). With this 
definition the length is invariant under reparameteriza- 
tion of the curve. More precisely, suppose cj> : I — > I' is 
a smooth, strictly monotone increasing function taking 
the interval / onto another interval I'. Then the iden- 
tity If(s) = I p (so <fi) follows easily from the definition of 
length, the positive homogeneity of the local metric, i.e., 
F(x,ay) = aF(x,y), elementary differential geometry, 
and calculus. 

We define the distance dp(x,x') between two points x 
and x' on M as the infimum of lp(s) over all curves s 
connecting x and x'. 

To explain the connection between local metrics and 
the cost function, we need to introduce locally adapted 
co-ordinates on the manifold SU(2 n ). We define these 
co-ordinates as follows. First, fix an origin U £ SU(2 n ), 
and define ip : tt 4 "" 1 -> SU(2 n ) by yb(x) = exp(-ix-a)U, 
where a here is the (4™ — l)-component vector whose 
entries are the generalized Pauli matrices. This maps 
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R 4 _1 onto SU(2 n ) in a many-to-one fashion. Supposing 
S is an open subset of R 4 " -1 such that tp : S — ► SU(2 n ) is 
one-to-one, we define a set of U -local adapted co-ordinates 
to be the inverse function (j> : ip(S) — > S. 

In practice, we shall only be interested in {/-local 
adapted co-ordinates for unitary matrices V in some 
small neighbourhood of U. In such a neighbourhood we 
may simply define 



IT) 



where In is the standard branch of the logarithm. We call 
this co-ordinate system the [/-local adapted co-ordinates. 
Note that we will use the terms "[/-local adapted co- 
ordinates" and "local adapted co-ordinates" interchange- 
ably, with the former preferred when we wish to be spe- 
cific about the identity of the origin, and the latter pre- 
ferred when we wish to omit specific identification of the 
origin. 

Any co-ordinate system x a for an open neighbour- 
hood of U S SU(2 n ) induces a corresponding natural co- 
ordinate system for the tangent space TjjSU(2 n ). This 
is done by singling out the natural basis (d/dx <7 )u for 
TuSU(2 n ), and expanding an arbitrary tangent vector 
y € TuSU(2 n ) as y = J2* if {d / dx a )u . We refer to the 
f as the natural co-ordinates for y with respect to the co- 
ordinate system x a . We refer to the co-ordinate system 
(x 17 , y a ) for T SU (2™) as a natural co-ordinate system for 
the tangent bundle TSU{2 n ). 

Suppose now that V is a unitary for which U- 
local adapted co-ordinates are defined. Then those co- 
ordinates give rise to a set of natural co-ordinates for 
TvSU(2 n ), which we call the natural U -adapted co- 
ordinates, or just the natural adapted co-ordinates, when 
it is clear what value U takes. The following proposi- 
tion gives a way of computing the natural adapted co- 
ordinates for the vector tangent to a curve. 

Proposition 1. Let U(t) be a smooth curve in SU{2 n ). 
Then (a) i*§fU^ is Hermitian, and (b) the natural U(t)- 
adapted co-ordinates y a for the tangent to the curve, 
[U]t € Tjj( t jSU(2 n ), are determined by the equation 8 



dU , 
y.a = l — W. 



(8) 



In particular, we have y a — itr[a dU / dtU*) /2 n . 

Proof: There is a sense in which (a) follows from (b), 
since the y a are real, by definition. Nonetheless, it seems 
worthwhile to include the following brief proof of (a). 
Using the unitarity of U(t) we have 



= 



d{UW) 
dt 



dt 



dt 



(9) 



Note the convention, used here and throughout, that the a in 
y" refers to a specific generalized Pauli matrix, while in y ■ cr it 
refers to the entire vector of generalized Paulis. 



Thus (dU/dt)W is anti-Hermitian, which proves part (a). 
To prove (b), we expand 



U(t + A) = 



exp 



t 

dU 
~dt 



-0(A 2 ). 



0(A 2 ) (10) 
[/(i) f A^ U{t) 

(11) 



It follows that the natural [/(i)-adapted co-ordinates of 
the tangent [U] t are determined by Equation ©. □ 

Suppose now that we define a local metric F on the 
manifold SU{2 n ) by F(U,y) = f(j), where / is a cost 
function, and 7 is the vector whose co-ordinates are the 
natural [/-adapted co-ordinates of y. With this defini- 
tion, the conditions for F to be a local metric follow 
immediately from the conditions we obtained earlier for 
the cost function / — positivity, positive homogeneity, 
and the triangle inequality. 

Furthermore, observe from Proposition ^ that if V(t) 
is a solution to Equations J2J and J2J, then the natural 
V(t)- adapted co-ordinates for the tangent to the curve V 
at the point V(t) are just the control functions f(t). It 
follows that the cost 0/(7) is equal to the length If{V) of 
the curve V on the manifold SU(2 n ), and therefore that 
the cost Cf(U) is equal to the distance dp(I, U) between 
the identity operation I and the unitary U. 

We can also show that F is an example of a special type 
of local metric known as a right-invariant local metric. In 
general, suppose G is any Lie group (such as, for example, 
SU{2 n )), and F : TG — > [0,oo) is a local metric defined 
on G. Then F is right- invariant if F(x, r xif (y)) = F(e, y), 
where e is the Lie group identity, r x is right multiplication 
by x, i.e., r x (x') — x'x, and r x * is the pushforward of r x 
at e, mapping T e G to T X G. With this definition, we see 
that right-invariant local metrics are simply those which 
are constant in natural adapted co-ordinates, and thus 
our F is an example of a right-invariant local metric. 
We note for later use that left-invariant local metrics are 
defined similarly to right-invariant local metrics, except 
the invariance is now under left multiplication, and that 
a local metric which is both left- and right-invariant is 
said to be bi-invariant. 

Summarizing, we have argued on general grounds that 
the cost associated to the synthesis of a unitary operation 
U ought to be given by the distance dp (I, U) between the 
identity / and U, for some right-invariant local metric F. 



C. Minimal curves and lower bounds 

In this subsection we prove a simple theorem relat- 
ing geometry to minimal size quantum circuits, giv- 
ing sufficient conditions for a local metric F to satisfy 
d F (I,U) < mg(U). 

To state the theorem, it helps to first introduce a little 
more notation and nomenclature. Suppose Q is some set 
of unitary gates which is exactly universal when acting 
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on n qubits. For example, Q might consist of all one- 
and two-qubit unitary gates which can be written in the 
form exp(— iacr), where < a < 1 and a is either a 
single-qubit Pauli, or a two-qubit Pauli. 

Suppose TL is a set of Hermitian matrices such that the 
map TL — * Q defined by H — > exp(— iH ) is one-to-one and 
onto. As an example corresponding to the set Q defined 
in the previous paragraph, we have TL consisting of all 
Hermitian matrices of the form aa, where a and a are 
as in the definition of Q. 

On the tangent bundle TSU(2 n ) we write (V, H) to 
denote the pair consisting of V E SU(2 n ) and the tan- 
gent vector [exp(-iHt)V] t =o € T v SU(2 n ). This nota- 
tion should not be confused with the similar notation 
(V,y) for elements of TSU{2 n ) that we've used up to 
now, and will continue to use when appropriate, where 
V e 577(2") and y G T v SU(2 n ). The advantage of the 
new notation is that it allows us to write F(V, H) to de- 
note the cost of applying the Hamiltonian H at the point 

VeSU(2 n ). 

Finally, suppose F is a local metric on SU(2 n ) satis- 
fying F(V,H) < 1 for all V in SU(2 n ), and for all H in 
TL. Then we say that F is Q -bounding. The reason for 
this nomenclature is provided by the following theorem, 
which shows that whenever F is ^-bounding, dp(I,U) 
provides a lower bound on mg(U). 

Theorem 1. Suppose Q is an exactly universal gate set 
on SU(2 n ), and TL is a corresponding set of Hermitian 
matrices, as described above. Suppose F is a Q -bounding 
local metric on SU(2 n ). Then for any fixed U in SU(2 n ) 
the inequality 

d F (I,U)<m g {U) (12) 

holds. 

Proof: Suppose that Ui = exp(-iHi), . . . , U rng ^u) = 
exp(—iH mg nj-\) is a minimal sequence of quantum gates 
synthesizing U, where the gates are chosen from Q. We 
define a curve V(t) between / and U by defining a control 
function induced by this gate sequence. The definition of 
the control function is: 

(Hi if < £ < l/mg(U) 
7(*) ■ a _ I H 2 if l/m g (U) < t < 2/mg(U) 
mg(U) I 

I H mg{u) if 1 - l/mg(U) < t < 1. 

This control function gives rise to a curve between I and 
U by integrating Equations © and ©. However, the 
curve is not smooth, and so the control function is not 
valid. To correct this we regularize 7 to produce a smooth 
control function 71 that also generates U. 

We do this regularization using a real-valued smooth 
function r(t) with the properties that: (a) r(t) = for 
any point t which is an integer multiple of l/mg(U); (b) 
r (t) > 0; an d ( c ) for any integer j the integral of r(t) 
over the interval \j/mg(U), (j + l)/mg(U)} is l/mg{U). 
Such a function is easily constructed using the standard 
techniques of analysis. 



We now define a modified control function 71 (t) = 
r{t)^(t), and the corresponding curve V(t) is defined by 
integrating Equations @ and © . This is now a smooth 
curve connecting / and U , and the length of the curve is: 

If(V) = f dtF(V(t),ji(t)-a) (14) 
Jo 

= f dtr(t)F{V(t),*Y(]t) ■ a) (15) 
Jo 

< j dtr(t)m g (U) (16) 
Jo 

= mg(U), (17) 

where we have applied, respectively: the definition of 
length; the property that a local metric is positively ho- 
mogeneous in the second variable; the fact that ^(t) ■ 
a/mg(U) is in TL, and the assumption F(V,H) < 1 for 
all V £ SU{2 n ) and H G TL; and, finally, the fact that 
for any integer j the integral of r(t) over the interval 
[j/m g (U),(j + l)/mg(U)} is l/mg{U), It follows that 
dp (J, U) < mg(U), as claimed. □ 

D. Geodesies and Finsler geometry 

We have argued that the cost function / corresponds to 
a right-invariant ^/-bounding local metric F on SU{2 n ). 
In this subsection we will argue that if we are to study the 
function dp(I, U) using the calculus of variations, then F 
ought to belong to a special class of local metrics known 
as Finsler metrics. 

To see this, we again start with some heuristic moti- 
vating arguments regarding the properties of /, before 
turning to a discussion of what these properties mean 
geometrically, i.e., in terms of the local metric F. 

Smoothness: In order to apply the calculus of vari- 
ations, we need to make some smoothness assumptions 
about the cost function /. Although it is not strictly 
necessary, we will assume that the cost function is differ- 
entiable to all orders away from the origin, i.e., f{y) is 
a C°° function, except at the origin. The reason we ex- 
clude the origin from the smoothness requirement is that 
if / is non-negative and positively homogeneous, as we 
argued it ought to be earlier, then the only way / can be 
differentiable at the origin is if it vanishes everywhere. 

The strict triangle inequality and the Hessian: 
We argued earlier that / ought to satisfy the strict tri- 
angle inequality. Our assumption that / is also smooth 
enables us to recast the strict triangle inequality in a 
more convenient and (almost) equivalent form. We de- 
fine the (4™ — 1) x (4" — 1) Hessian matrix whose entries 
are H aT = ^d 2 f 2 / 'dy a dy T , where y" is our notation for 
the crth co-ordinate slot in the function f 2 . It turns out 
that a necessary condition for the triangle inequality to 
hold is that the Hessian matrix be a positive matrix. A 
sufficient condition for the strict triangle inequality to 
hold is that the Hessian matrix be strictly positive, and 
this is the condition we shall impose on /. 
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These conditions are well-understood in the Finsler ge- 
ometry community, and so we merely outline why these 
facts are the case, omitting the details. The interested 
reader is referred to. e.g., Section 1.2 of the book by Bao, 
Chern and Shen [44| for a more detailed discussion. Our 
reason for including this brief discussion here is partially 
motivational, but also because many of the ideas intro- 
duced will be needed later, when we discuss the approx- 
imation of local metrics by Finsler metrics. 

The key concept behind these results is that of the in- 
dicatrix. The indicatrix of /, denoted St, is defined to 
consist of all those points y such that f(y) = 1. The 
indicatrix generalizes the unit sphere, where / is the Eu- 
clidean norm function. We define the unit ball Bf for / 
to consist of all those points y such that f(y) < 1. 

Assuming / is positively homogeneous, it is not dif- 
ficult to show that the triangle inequality f(x + y) < 
f(x) + f(y) is equivalent to the condition that the unit 
ball Bf be convex. Under the same condition, the strict 
triangle inequality is easily seen to be equivalent to the 
condition that Bf be strictly convex, i.e., any line joining 
two points of Bf should be contained entirely within the 
interior of Bf, except possibly at the endpoints. Equiv- 
alently, the tangent hyperplane to Bf at any point of Sf 
should only touch a single point of Bf. 

How does this geometry relate to the Hessian? Suppose 
we pick a point yo on the indicatrix. Consider the tangent 
plane, defined to consist of those points y satisfying V/ • 
V = V/-yo- Define A = y—yo- Expanding f(y) = f(y + 
A) in a Taylor series in A and doing some elementary 
manipulations gives 



/(y + A) = l + A T i/A + O(A 3 ), 



(18) 



where A T indicates the transpose of A, and H is the 
Hessian matrix. Thus, provided the Hessian is strictly 
positive, it follows that yo is the only point in the tangent 
hyperplane which is also in Bf, and so the the indicatrix 
is strictly convex, and the strict triangle inequality is 
satisfied. 

The standard terminology is that / is strongly convex 
when the Hessian is strictly positive; this is a stronger 
condition than strict convexity of /. It is not difficult 
to find examples where the Hessian is only positive, not 
positive definite, and yet the strict triangle inequality 
holds. Essentially, at such points the quadratic terms in 
f(yo + A) may vanish, yet we still have f(yo + A) > 1, due 
to the contribution of higher-order terms, ensuring the 
strict triangle inequality holds. See, e.g., Exercise 1.2.7 
of 0. 

We will see that there are significant advantages to 
assuming that the Hessian is strictly positive, i.e., that 
strong convexity holds. In particular, in Section lnTl we'll 
see that this is exactly the condition needed to make the 
geodesic equation a second order differential equation. If 
the indicatrix is strictly but not strongly convex then the 
geodesic equation is not a second order differential equa- 
tion, but one must instead go to higher order equations, 
which substantially complicates the study of geodesies. 



Summarizing, we have argued that the cost function / 
ought to be smooth away from the origin, and the Hessian 
of / ought to be strictly positive definite. We now show 
that this means that the corresponding local metric F on 
SU(2 n ) is a Finsler metric. 

Finsler geometry is a well-developed subject, and our 
treatment here is based on the standard text by Bao, 
Chern and Shen |44|, and on the notes of Alvarez and 
Duran 0, to which the reader should refer for more 
details. 

To define Finsler metrics it helps to first define the no- 
tion of a Minkowski norm, which is a function N : K d — > 
[0, oo) which is smooth away from the origin, satisfies 
N(y) > with equality if and only if y — 0, is positively 
homogeneous, and strongly convex in the same sense de- 
scribed earlier, i.e., the Hessian matrix H = (Hjk) whose 
components are the partial derivatives 



H 



1 dN 2 



2 dyidy k 



(19) 



is strictly positive when evaluated at any point y € R d . 

Informally, a Finsler metric is a family of Minkowski 
norms on the tangent spaces to the manifold, one norm 
for each point on the manifold, and such that the 
Minkowski norms vary smoothly as a function of posi- 
tion on the manifold. More precisely, a Finsler metric 
on a manifold M is a function F : TM — ► [0, oo) such 
that F(x,y) is a smooth function of x and y for all x 
and all y ^ 0, and such that for each fixed x, F(x, •) is a 
Minkowski norm on the tangent space T X M . 

Clearly, Finsler metrics are a special case of local met- 
rics. Note also that Riemannian metrics are a special 
case of Finsler metrics, coinciding with the condition that 
F(x, -) 2 be a quadratic form for each fixed x € M. 

When the cost function / is smooth away from the ori- 
gin and has a strictly positive Hessian, we see that the 
corresponding local metric 9 F(U, y) = f{y) is an exam- 
ple of a right-invariant Finsler metric. Thus, the local 
metrics we shall be most interested in studying in this 
paper are right-invariant ^-bounding Finsler metrics. 

Finsler metrics have a number of useful properties that 
we note here without proof. 

First, the Hopf-Rinow theorem (see page 168 and exer- 
cise 6.2.11 on page 155 of jSjj) implies that for a compact 
Finsler manifold the infimum in the definition of dp is al- 
ways achieved by some smooth curve. 

Second, Euler's theorem for smooth and homogeneous 
functions (see Section 1.2 of |44|1 implies a number of 



9 Recall that the components of 7 are just the natural [/-adapted 
co-ordinates for y. 
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useful identities satisfied by Finsler metrics: 



OF 2 



Eur 



yy 



E 



dyidy k 

dF 2 
dyWy k dy l 



2F Z 



2F l 



(20) 
(21) 
(22) 



to the Hamiltonian y ■ a. Then we define 



Fn(U,y) 



F g (U,y) 



F p (U,y) = 5>(wt(a))|i 



/^g(wt(a))(^)2 



(23) 
(24) 
(25) 
(26) 



In these equations, the y J are any fixed set of co-ordinates 
for the tangent space T X M . Note that Equation (122H can 
be recast in several different ways, depending on which 
order the partial derivatives are taken. We use several of 
these different orderings later. 



E. Examples of local metrics whose minimal length 
curves provide lower bounds on circuit size 



We have argued that if F is to be useful for proving 
lower bounds on mg(U) then it ought to be a (/-bounding 
local metric; even better, a Finsler metric, in order that 
the calculus of variations and results like the Hopf-Rinow 
theorem be applicable. In addition to these properties, 
a local metric F ideally should have the following three 
properties: (1) it is easy to determine the minimal curve 
length cIf{I,U); (2) there exist families of unitaries in- 
dexed by n and with long minimal geodesies according 
to F, i.e., families of unitaries for which dp (I, U) scales 
exponentially with n; and (3) cIf{I,U) is polynomially 
equivalent to mg(U). Note that (3) implies (2), since 
unitaries for which mg(U) is exponential are known to 
exist 0. 

It is a significant open problem to find a local metric 
satisfying all of these properties. 

The purpose of the present subsection is to introduce 
four natural candidates for such a local metric, denoted 
Fi,F2,F p , and F q , and to discuss the extent to which 
they satisfy these desired properties. All of them are 
right-invariant (/-bounding local metrics, and thus at 
least satisfy the inequality dp(I, U) < mg(U). 

We will see that one of these local metrics, F2, defi- 
nitely does not have all the desired properties. Although 
our discussion is not conclusive, it is plausible that each 
of the other three local metrics does possess the de- 
sired properties, with the caveat that F\ and F p are not 
Finsler, and must be approximated by suitable Finsler 
metrics. In particular, we will present some heuristic ev- 
idence that Fi and F p satisfy all our criteria. 

To define our local metrics let U £ SU(2 n ), y E 
TuSU(2 n ), and suppose y has natural [/-adapted co- 
ordinates y a , and so can be thought of as corresponding 



In these expressions, wt(cr) is the Hamming weight of 
the Pauli matrix a, and p(-) and q(-) are penalty func- 
tions that penalize the control function whenever Pauli 
terms of high weight contribute to the control Hamilto- 
nian. E.g., we might choose p(J) — 4 J to provide an 
exponential penalty for the use of higher-weight Pauli 
matrices. We return to the choice of the penalty func- 
tion below. 

As was remarked earlier, it is often useful to write 
F(U,H) = F(U,y), where H is the Hamiltonian such 
that y — [exp(—iHt)U]t=o. With this convention it is 
easily verified that right-invariant local metrics such as 
F\ , F2 , F p and F q have no U -dependence, and so we some- 
times write F(H) = F(-,H). Note that F(H) is a norm 
on su{2 n ). 

We now apply Theorem ^ to these example metrics. 
To do this, we choose the universal gate set Q as in the 
example described earlier, specifically, to consist of all 
one- and two-qubit unitary gates which can be written 
in the form exp(— iaa), where < a < 1 and er is a 
Pauli matrix of weight one or two. The corresponding "H 
consists of all Hermitian matrices of the form cut. 

We see immediately that with these choices F\ and 
F2 satisfy the hypothesis of Theorem ^ namely, we have 
Fi(V,H) < 1 and F 2 {V 1 H) < 1 for all V € 51/(2") 
and all H e H. Thus, we have cIf 1 (I,U) < mg(U) 
and (If 2 {I,U) < mg(U). In the case of F p and F q we 
need the supplementary assumptions that p(l),p(2) < 1 
and q(l),q(2) < 1, respectively. With these assump- 
tions it is easily verified that the hypothesis of Theo- 
rem [2 holds, and so we have <1f (I,U) < mg(U) and 
d Fq (FU)<mg(U). 



1. Properties of F2 and F q 

F2 is an example of a very well understood class of 
local metrics: it is a bi-invariant Riemannian metric on 
SU(2 n ). Such local metrics have the nice property that 
their geodesies are completely understood — they are 
the curves of the form cxp(-iHt), where H € su(2 n ) - 
as are properties such as curvature and other geomet- 
ric invariants. See, e.g., the end-of-chapter problems in 
Chapter 5 of H|. 

Although F 2 is well understood, it turns out that the 
lower bounds on mg(U) obtained from F2 are at best con- 
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stant, and thus are not especially interesting. In particu- 
lar, we will show that for any U we have dp 2 (I, U) < tt 2 , 
and thus the best possible bound we can hope for is 
tt 2 < mg(U). 

To see that dp 2 (I,U) < tt 2 , observe first the identity 
F 2 {V,H) = tr(iJ 2 )/2". Then for any U, select Her- 
mitian H with eigenvalues in the range — n to tt and 
such that exp(—iH) = U. Define a curve 7 from / to 
U via -f(t) = exp(~iHt); this can actually be shown 
to be the minimal length geodesic through U , although 
we won't need this fact. The length of this curve is 
i Fa ( 7 ) = dtF 2 (exp(-iHt),H) = tr(# 2 )/2™. It fol- 
lows that 10 dp 2 (I, U) < tr ^f - 1 . But from the eigenvalue 
bounds on H it follows that tr(H 2 ) < 7r 2 2™, whence we 
obtain 

dp 2 {I,U) <tt 2 . (27) 

Thus we can never hope to prove any more than that 
tt 2 < mg(U) using F 2 . 

The essential reason F 2 is unsuitable for proving lower 
bounds is that it contains no information about the ten- 
sor product structure, as can be seen from the expression 
F 2 (H) = tr(_ff 2 )/2™. How can we encode information 
about the tensor product structure in the metric, in or- 
der to have some hope of obtaining non-constant lower 
bounds on circuit size? One possibility is to simply ex- 
clude the possibility of the control Hamiltonian contain- 
ing Pauli terms of weight higher than two. To do this 
we need to move to the field of sub-Riemannian geom- 
etry |47j |. which is concerned with the situation where 
there are restrictions on the allowed directions that a 
curve may take in the tangent space. This direction of 
research is under investigation by the author. 

Another possible approach is to introduce a penalty 
function q(-) which penalizes the use of high weight Pauli 
matrices in the control Hamiltonian. Many forms for 
the penalty function suggest themselves, and it is not 
clear which, if any, is the most appropriate. Here is one 
illustrative choice: 

/ \ _ f 1 if j = 1 or 2 , . 

1 k otherwise, ^ ' 

where k is a penalty that may depend on the number 
of qubits n, but is otherwise constant. As k becomes 
large we expect that this approach will yield essentially 
the same geodesies as in the sub-Riemannian approach 
mentioned above. One advantage of using such a local 
metric is that it is a right-invariant Riemannian metric, 
and such local metrics are quite well understood. See, 
e.g., Appendix 2 of ^^|, and the end-of-chapter problems 
in Chapter 5 of [Z^. 



As this is the minimal length curve, the inequality which follows 
is, in fact, an equality. 



2. Properties of Fi and F p 

The local metric Fi is perhaps the most promising 
cost function, due to the following interpretation. Sup- 
pose U may be generated by applying sequentially the 
Pauli matrices oi, cr 2 , ■ ■ ■ for times t±, t 2 , . . .. Then the 
length l Fl of the corresponding curve is just the total time 
ti +t + 2 + . . .. An approximate converse is also true. In 
particular, using the Trotter formula it is easy to prove 
that for any Hamiltonian H and time t > it is possi- 
ble to approximate exp(—iHt) arbitrarily well using just 
Pauli Hamiltonians a, applied for a total time Fi(H)t. 
It follows that given a curve 7, we can approximate 7 
arbitrarily well using a sequence of evolutions, each one 
a Hamiltonian evolution with Hamiltonian some general- 
ized Pauli sigma matrix, with the total time of evolution 
being just lp 1 ( r y). Note also that Hamiltonian evolution 
according to any single generalized Pauli matrix is easily 
simulatable with at most a linear number of gates in the 
standard quantum circuits model. 

Thus, dp 1 (I,U) has a natural interpretation as the 
minimal time required to generate U by switching be- 
tween Hamiltonians chosen from the set of generalized 
Pauli matrices, each of which can be efficiently simulated 
in the standard quantum circuits model. 

It is tempting to suppose on the basis of this interpre- 
tation that d Fl (I, U) must be polynomially equivalent to 
mg{U). Although I believe this likely to be the case, it is 
possible to imagine, for example, that d Fl (I,U) is small 
(maybe even a constant), and yet the oscillations in any 
near-optimal path 7 are so wild that to approximate it in 
the quantum circuit model requires exponentially many 
gates. If this were the case then the cost in doing the 
computation would not be due to the actual Hamiltonian 
evolution, but rather due to extremely frequent switching 
between very short evolutions by different Pauli Hamil- 
tonians. 

In the event that this turns out to be the case, one 
potential resolution would be to work instead with F p , 
in which a penalty function p(-) is used to penalize the 
use of higher-weight Pauli matrices in the Hamiltonian 
evolution. As was the case for F q it is not clear what 
the best choice of penalty function is, but various simple 
alternatives naturally suggest themselves. In particular, 
as for F q , by choosing p appropriately we can effectively 
rule out some directions of movement on the manifold. 
When F p is by approximated by a suitable Finsler met- 
ric (as described in the next paragraph) ruling out such 
directions of movement places us effectively in the realm 
of "sub-Finsler" geometry |4^.l50l|. 

The main disadvantage of F\ and F p is that they are 
not Finsler metrics, and thus we can't directly apply 
the calculus of variations to study their minimal length 
curves. To remedy this situation, in Appendix 1X1 we ex- 
plain how to approximate F\ and F p by a family of Finsler 
metrics -Fia and F p \, where A > is a small parameter 
such that as A — > 0, Fia — * F\ and F p & — > F p . More 
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precisely, we show that 

F^U.y) < F 1A (U,y) < ^ff^ , (29) 

F p (U,y) < F pA (U,y)<^^, (30) 

where P = J2aP( w ^( a ))- Thus, provided A is sufficiently 
small the Finsler metrics Fi A and F pA provide excellent 
approximations to F± and F p , respectively. As a result, 
our strategy for understanding the minimal length curves 
of Fi and F p is to study them via the geodesies of Fi A 
and F p a- 

3. Summary and comparison 

We have introduced four classes of local metric, 
Fi,F2,F p , and F q . All four provide lower bounds on 
mg(U) through the inequality U) < mg(U). Sum- 

marizing and comparing their various properties: 

1. Fi is capable of producing at best a constant lower 
bound on mg{U), and thus is not especially inter- 
esting. 

2. F q is a modified version of F% in which we introduce 
a penalty for the application of higher- weight Pauli 
Hamiltonians. The main advantages of F q are that 
it is easy to compute and Riemannia. It is also 
straightforward to compute quantities such as cur- 
vature and the Christoffel symbols using standard 
results about right-invariant Riemannian metrics. 

3. Fx is the best motivated of all the four local met- 
rics. In particular, we showed that dp 1 (I, U) is the 
minimal time required to synthesize U using some 
set of efficiently simulatable Hamiltonians. It can 
be approximated arbitrarily well using a suitable 
Finsler metric -Fia- 

4. F p is a modified version of F± in which we introduce 
a penalty for the application of higher- weight Pauli 
Hamiltonians. It can also be approximated arbi- 
trarily well using a suitable Finsler metric F pA . 

I conjecture that for suitable choices of the penalty func- 
tions, p and q, all three of the local metrics Fi,F p , and F q 
are polynomially equivalent to mg(U), and thus could, in 
principle, be used to prove exponential lower bounds on 
mg(U). In this paper we will not resolve the correctness 
of this conjecture, although Section TlV CI presents some 
evidence that F\ (and thus F p ) has exponential length 
minimal curves, which provides some indirect evidence 
for the conjecture. 

Despite the lack of a proof of this conjecture, it remains 
an interesting problem to understand the geodesic struc- 
ture of each of these classes of local metric, and what 
this implies about the distances dp(I,U). It is to this 
problem that we turn for the remainder of this paper. 



III. COMPUTING THE GEODESIC EQUATION 

In this section we'll explain how to explicitly construct 
the geodesic equation for each of the Finsler metrics we 
have introduced. This equation is a second-order dif- 
ferential equation whose solutions are geodesies of the 
Finsler manifolds, i.e., curves in SU(2 n ) which are local 
extrema of the Finsler length. 

The exact form of the geodesic equation is rather com- 
plex, even for the simplest of our local metrics; we shall 
not write it out explicitly. Our goal in this section is 
to describe a general procedure which can be used to 
compute the geodesic equation, and thus enable numer- 
ical and analytic investigation of geodesies. A detailed 
numerical investigation of the geodesic structure is un- 
derway and will appear elsewhere. 

We begin in Subsection llH Al with a brief review of the 
geodesic equation for Finsler geometry. This is standard 
material in Finsler geometry, and so we cover it quickly, 
merely outlining derivations, and referring the reader to 
standard references such as [44( for more details. 

In order to apply the geodesic equation it is most 
convenient to pick out a single co-ordinate system 11 for 
SU(2 n ), and carry out all calculations with respect to 
those co-ordinates. In particular, using a single set of co- 
ordinates greatly facilitates integration of the geodesic 
equation, and analytic investigation of that equation. 
Unfortunately, the Finsler metrics we have introduced in 
Subsection III El are all defined in terms of local adapted 
co-ordinates, which vary from point to point on SU(2 n ). 

To remedy this situation, the majority of this sec- 
tion is taken up with learning how to change from local 
adapted co-ordinates to a single fixed set of co-ordinates 
for TSU(2 n ), which we call natural Pauli co-ordinates. 
We explain how to make this change of variables for the 
special case of TSU (2) in Subsection IIII Dl In Subsec- 
tion [^nji] we describe the main ideas behind the change 
of variables for TSU(2 n ), before describing some conve- 
nient calculational techniques for making this change in 
Subsection iHIFl 

With these results in hand it is possible in principle 
to explicitly compute the geodesic equation for each of 
the local metrics we have introduced. In practice, the 
actual form of the equation is rather complicated, and it 
is more convenient to investigate solutions numerically, 
or using techniques such as the analysis of symmetries. 
Indeed, as we do not do any numerical analysis in this 
paper, later sections of the paper in fact depend only 
on the basic form of the geodesic equation, presented in 
Subsection IIII Al and so the other parts of this section 



1 In fact, no single co-ordinate system can cover all of SU(2 n ). 
However, it is possible to pick a co-ordinate system that cov- 
ers all of TSU(2 n ) except a set of measure zero, and that is 
what we will do. We return later to the question of what to do 
on the set of measure zero. 
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may be skipped if the reader's main interest is in the 
geodesic solutions constructed in Section HVI 

The section also contains a brief digression, in Sub- 
section IIII Bl whose purpose is to illustrate the results 
of Subsection IIII Al with some simple results on the ef- 
fects ancilla qubits have on minimal length curves. In 
particular, we show that for a suitable family of Finsler 
metrics, there is a neighbourhood of the identity in which 
for all unitaries U the distance dp (I, U) is equal to the 
distance d F (I ® I,U <8> I), i.e., the distance is unaffected 
by the addition of ancilla on which the unitary acts triv- 
ially. These results, simple as they are, represent our 
only progress on the problem of understanding the effect 
ancilla qubits have on minimal size quantum circuits. 



A. General form of the geodesic equation 

In this subsection we construct the geodesic equation. 
We follow the standard procedure used in Finsler geome- 
try (see, e.g. 0|) to construct the equation, and for that 
reason we merely outline the relevant calculations. 

In order to construct the geodesic equation, it is conve- 
nient to fix a set of co-ordinates for SU(2 n ). We will la- 
bel these co-ordinates x j , and the corresponding natural 
co-ordinates for the tangent space yK The co-ordinate 
system x J chosen is a completely arbitrary chart from 
among the atlas of possible co-ordinates on the manifold. 
Note that at any point along a curve s = s(t) the tangent 
vector has natural co-ordinates given by y J = dx J jdt. 

In general, the co-ordinates x J do not cover all of 
SU(2 n ). As a result, to construct geodesies it is in gen- 
eral necessary to change the co-ordinate system being 
used as the geodesic moves across the manifold. How- 
ever, for this initial discussion it is most convenient to 
imagine that the co-ordinate system has been fixed, and 
we are computing geodesies that lie within the region 
covered by that co-ordinate system. 

Recall that we defined the length of a curve s : I — » M 
by l F (s) = f I dtF(s(t),[s] t ), where [s] t £ T s{t) M is the 
vector tangent to s at t. In terms of the co-ordinates 
x J and y J — dx J /dt this may be rewritten If(s) — 
JjdtF(x,y), where F — F(x,y) is F expressed in terms 
of the co-ordinates x = (x J ) and the corresponding nat- 
ural co-ordinates y = (yi) = (dx^/dt) for the tangent 
vector [s]t- 

In order to determine the geodesies which minimize 
the length we use the calculus of variations, a review of 
which may be found in |5ll |. It is a standard result in 
the calculus of variations that any curve s which is an 
extremum of the functional Jj dt F(x, y) must satisfy the 
Euler-Lagrange equations: 



d_ 

dt 



dF 
dyi 



dF 

dxi 



(31) 



We will sometimes refer to this equation as the geodesic 
equation, for its solutions give rise to geodesies. How- 
ever, for practical purposes it is more convenient to recast 



the geodesic equation in other forms. First, and rather 
remarkably, it is possible to replace F by F 2 in Equa- 
tion 1)3 1)1 to get an equation which is essentially equiva- 
lent: 



d [dF 2 



dt \dyi 



OF 2 

dxJ 



(32) 



To understand this equivalence and its significance, note 
the easily-verified identity: 

d (dF 2 \ dF 2 



= 2 



dt \ dyi 
dF dF 
~dtdyl 4 



2F 





(3F\ 


dF\ 


(I- 


{dyij 


~ dxl) 



(33) 



Suppose s — s(t) is a curve which solves the geodesic 
equation, Equation (|31|) . Then we can reparameterize 
this curve to give an equal length curve s which has 
constant speed, i.e., dF/dt — along the curve s. It 
is straightforward to verify that the curve s also solves 
the geodesic equation, Equation JSJ|. However, since 
dF/dt = 0, we see from Equation 1)33)1 that s is also a 
solution to Equation (|32|l . 

Conversely, suppose the curve s = s(t) is a solution to 
Equation (|32|) . Then we have: 



dF 2 
dt 



E 



Ed 
a. 



(dF 2 dxi ^ dF 2 dy^ 
\ d xi ~dt ^ 

(dF 2 



dyJ dt 



dt \ dyi 



(34) 
(35) 



Using Equation QXfr we see that £\ ffr2/ j = and 
so the previous equation implies dF 2 /dt — 2 dF 2 /dt, and 
so any solution to Equation (|32|l automatically satisfies 
the constant speed condition dF 2 /dt = 0. 

Summing up, the class of curves which solve Equa- 
tion (|31|l is equivalent to the class of curves which solve 
Equation l|32(l . up to a reparameterization which leaves 
the length invariant, and thus is of no interest. How- 
ever, solutions to Equation l|32|l have the additional use- 
ful property that they are automatically curves of con- 
stant speed. We therefore refer to either Equation l|31|) 
or Equation 1)32(1 as the geodesic equation, depending on 
context. 

Equation (|32|l may be recast in an equivalent form 
analogous to the standard geodesic equation for Rie- 
mannian manifolds, Equation (Q. First, using Equa- 
tions l|2~T|l we substitute F 2 — gi m y l y m , where we define 
gim = \ d yig ym to be the Hessian, and we use the summa- 
tion convention that repeated indices are summed over, 
unless otherwise stated. The right-hand side of Equa- 
tion l(3^1) then becomes gi m ,xiV l y m where we use the 
subscript notation " x j " to indicate a partial derivative 
with respect to the x J co-ordinate. 

To analyse the left-hand side of Equation l|32|l . we 
again substitute F 2 = gi m y l y m , and apply the usual rules 
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of calculus together with Equation (1221 to obtain 
OF 2 



2g jm y r ' 



(36) 



Taking the total derivative with respect to t, and again 
using Equation (|22p. we obtain 



d ( dF 2 \ „ _ „ dy. 

Jt \dyJ) = ^"""■•' / •" + 29jm 



At 



(37) 



Using these results we see that the geodesic equation, 
Equation l|32|l . may be recast in the form 

2g jm ^y m y n + 2g jm ^ = .<//„,, ////'"• (38) 

By assumption, the matrix whose components are the gjf. 
is strictly positive, and so it is possible to define a matrix 
gi k which is the inverse of g^. Multiplying Equation 138|) 
by this inverse and doing some rearrangement, we obtain 
our alternate form of the geodesic equation, 



dx k dx l 



dt 2 ' kl dt dt ' 
where the Christoffel symbols Tl, are defined by 



1 kl — 2 \9mk,x> 



9ml,: 



9klr 



(39) 



(40) 



Formally, this definition for the Christoffel symbols ap- 
pears identical to that used in Riemannian geometry. 
The difference is that in Riemannian geometry the gjk 
are functions of x alone, while in Finsler geometry they 
are typically functions of y as well. 

Summing up, we have presented the geodesic equation 
in three different (but equivalent) forms, Eqs. J2U, 
and l|39l) . The latter is explicitly in the form of a second 
order differential equation, and so the usual existence and 
uniqueness theorems for second order ordinary differen- 
tial equations apply. In particular, given an initial posi- 
tion and velocity (i.e., tangent vector) on the manifold, 
the remainder of the geodesic is completely specified by 
Equation l|39|) . Of course, this form of initial data prob- 
lem is not the problem of most interest to us. We are 
more concerned with the problem of studying geodesies 
where two points on the geodesic are specified, but the 
initial velocity is unknown. 



B. Application: Ancilla qubits and direct sum 
theorems 

As an illustration of the results of the previous subsec- 
tion, consider the problem of determining mg(U <g> V"), 
where U is a unitary on an n^-qubit system, labeled 

A, and V is a unitary on an n^-qubit system, labeled 

B. An interesting question is to ask how mg(U ® V) 
is related to mg(U) and mg(V), where the notation 



Q is overloaded in the obvious way. ft is clear that 
mg(U ® V) < mg(U) + mg(V). Might this inequality 
sometimes be strict, or must it be satisfied with equal- 
ity? 

Questions like this are the province of direct sum the- 
orems, which seem to have first been considered by |52| . 
Essentially, a direct sum theorem seeks to establish 
whether a set of two or more computational tasks can 
be collectively accomplished using fewer resources than 
the sum of the resources required for the individual tasks. 

Of particular interest in the context of the current pa- 
per is the case V = I, which is related to the problem 
of determining whether or not ancilla can help in imple- 
menting U. 

With the tools available we cannot directly study the 
behaviour of mg(U ® V), but we can study the related 
question of whether dp AB (Ia^Ib, U®V) — dp A (lA,U) + 
dF B {^B,V), where Fa,Fb and Fab are suitable Finsler 
metrics on the respective spaces. We will not solve this 
problem in general, but can easily obtain some simple 
results indicating that, at least near the identity, this 
equality will always be satisfied for suitable choices of 
the Finsler metrics. 

Suppose Fa,Fb and Fab are Finsler metrics on 
SU(2 nA ),SU(2 nB ) and SU{2 nA+nB ), respectively. We 
say they form an additive triple of Finsler metrics if: 



F\ B {U®V,H A 



H B ) = F 2 A (U,H A ) + F 2 (V,H B ) 



(41) 



where H a G su(2 nA ), H B G su(2 UB ), and we abuse no- 
tation by omitting tensor factors which act trivially, like 
I A <8> • and ■ ® Ib- 

Suppose U(t) is a geodesic of Fa, and V(t) is a geodesic 
of F B . If Fa, F b and Fab are an additive triple of Finsler 
metrics, then it follows from the linearity of the geodesic 
equation, Equation JSU, that W(t) = U(t) ® V(t) is a 
geodesic of Fab- 

An example of an additive triple of Finsler metrics is 
the triple F qA ,F qB , and F qAB , where the penalty func- 
tions qA,<iB and qAB satisfy the consistency condition 
1a {j) = QbU) = <7ab(j), for all j where this condition is 
well-defined. It follows that if U{t) and V{t) are geodesies 
of F qA and F qB , then U(t) ® V(t) is a geodesic of F qAB . 

We have seen that a tensor product of geodesic curves 
is itself a geodesic curve, for additive triples of Finsler 
metrics. Can we conclude that the shortest curve con- 
necting I a ® Ib and U <8 V is just the tensor product of 
the shortest curve connecting I a and U with the shortest 
curve connecting I B and VI 

I do not know if this is the case, in general. However, 
a well-known theorem of Finsler geometry (see, e.g., Sec- 
tion 6.3 of 0|) asserts that for any manifold M equipped 
with a Finsler metric F, there exists a constant r > 
such that any geodesic of length less than r is guaran- 
teed to be a minimal length curve. 

It follows that for U and V in some finite-size neigh- 
bourhood of the respective identity operations, 1a and 
Ib, the minimal curve from I a <£> Ib to U <8> V is just 
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the tensor product of the minimal curves in the re- 
spective spaces. It follows that in this neighbourhood 
the distance oIf ab (I a ® Ib,U ® V) is equal to the sum 
d FA (I A ,U)+d FB (I B ,V). 

Specializing to the case where V — Ib , we see that for 
all U in some finite-size neighbourhood of I a the minimal 
length curve from I a® Ib to U (g) Ib is guaranteed to be 
exactly equal to the minimal length curve from 1a to 
U. That is, there is a neighbourhood of the identity in 
which the presence of ancilla does not help in shortening 
the length of the minimal curves. 

C. Pauli co-ordinates 

As noted in the introduction to this section, our pri- 
mary goal in this section is to explain how to compute 
the geodesic equation with respect to a fixed co-ordinate 
system on SU(2 n ). The co-ordinates we shall use are 
the Pauli co-ordinates. In our earlier language, Pauli co- 
ordinates are /-adapted local co-ordinates for SU(2 n ), 
where I is the n-qubit identity operation. The unitary 
corresponding to Pauli co-ordinates x is given by 

cxp (— ix ■ cr) = cxp ^— i a^crj . (42) 

Inverting, the co-ordinates x" corresponding to some uni- 
tary U are given by 

x , _ M^W, (43) 

where In is some branch of the logarithm. We will be 
particularly interested in the case where In is the stan- 
dard branch of the logarithm, defined around a branch 
cut along the negative real axis. We call these co- 
ordinates the standard Pauli co-ordinates, or just Pauli 
co-ordinates. Note that the standard Pauli co-ordinates 
are defined for any unitary operator whose spectrum 
does not include —1, and thus are defined everywhere 
in SU(2 n ) except on a set of measure zero. 

Just as for local adapted co-ordinates, the Pauli co- 
ordinates on SU(2 n ) give rise to natural co-ordinates 
on the tangent space TuSU(2 n ). In the remainder of 
this section we will typically use x a to denote Pauli 
co-ordinates, x a to denote local adapted co-ordinates, 
and y a and y a to denote the corresponding natural co- 
ordinates on TuSU(2 n ). 

D. Changing co-ordinates in TuSU(2) 

Let's begin with the example of TjjSU(2), where it is 
relatively straightforward to change between the natural 
Pauli and natural locally adapted co-ordinates. The key 
to making the change is the following theorem. Note that 
in this subsection (and in the associated Appendix |B|) we 



will find it useful to work both with vectors in M 3 , and 
with vectors relating directly to objects in the tangent 
bundle TjjSU(2). We will refer to the former using the 
notations x, y and y, while for the latter wc will use x to 
refer to the vector of Pauli co-ordinates for U E SU(2), 
and y to refer to an element of TjjSU(2). 

Theorem 2. FixxeR 3 . Then 

exp(— i(x + ty) ■ a) 
= exp(-ity ■ cr) exp(-if • cr) + 0(t 2 ), (44) 

where y may be expressed as a function of x and y: 

V = y\\ + INI cot(||f||)y_L +y x x. (45) 

In this expression x = x/\\x\\ is the normalized vector 
in the x direction, y\\ = x ■ yx is the component of y in 
the x direction, and y± = y — y\\ is the component of y 
orthogonal to the x direction. We can invert this equation 
to express y in terms of x and y, obtaining: 

V = V\\ + sinc(2\\x\\)y± + sinc 2 {\\x\\)x x y±, (46) 

where y\\ is now the component of y in the x direction, 
is the component of y orthogonal to x, and sinc(z) = 
sin(,z)/2: is the standard sine function. 

The proof of this theorem is a straightforward calcula- 
tion. We describe the details in AppendixlBl Alternately, 
it is a useful and not entirely trivial exercise to deduce 
the theorem from the more general results about SU (2™) 
in the next subsection. 

To see how this theorem enables us to change co- 
ordinates, fix U £ SU(2), and fix y £ T V SU(2). Suppose 
x are the Pauli co-ordinates for U. Then we have 




for some set of co-ordinates y a , and where (d/dx a )u are 
the natural Pauli co-ordinate basis vectors for TjjSU(2). 
Setting x = x and letting y have components y a , we see 
that the vector y £ TijSU(2) is tangent to the curve 
exp(— i(x + ty) - a) at t = 0. That is: 

y = [exp(-i(x + ty) ■ a)] t=0 . (48) 

Applying Theorem |2 we obtain 

y = [eM-Hy ■ <r)U + O(t 2 )] t=0 . (49) 

Neglecting the terms of order t 2 does not change the tan- 
gent at t = 0, and so 

y = [exp(-ity ■ a)U} t =o- (50) 

It follows that y a are the natural adapted co-ordinates 
for T[jSU(2). Thus Theorem prelates the natural Pauli 
co-ordinates y a on TuSU(2) to the natural [/-adapted 
co-ordinates y a on TuSU(2). 
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E. Changing co-ordinates in TuSU(2 n 



where y is determined by x and y through the equations 



The key result enabling the change between natural 
Pauli and natural locally adapted co-ordinates is a gener- 
alization of Theorem|21which applies to unitary operators 
in arbitrar y d imensions. This result is due to Baker |53| . 
Campbell [54L l55j and Hausdorff [5(| , and we refer to it 
as the BCH formula. Note that this result is not what 
is usually referred to as the BCH formula by physicists, 
but is a generalization. See Section 3.4 of 57] for a recent 
discussion of the BCH formula and related results. 

To state the BCH formula it helps to first define linear 
superoperators (i.e., linear operations on matrices) adjf 
and X by a.A x {Z) = [X, Z], and 1(Z) = Z. With these 
definitions we have the following. 

Theorem 3 (BCH formula). Suppose X and Y are 
d x d Hermitian operators. Then we have 

cxp(-itY) exp(-OT) = exp(-i(X + tY)) + 0(t 2 ), (51) 
where the d x d Hermitian operator Y is defined by 



V-o- = £ 3 . a (y-a) 



Y = £x(Y), 
and £x is a linear superoperator defined by 
exp(-iadx) — 1 



iad 



x 



(52) 



(53) 



In this theorem, the superoperator £x is understood as 
a formal power series. In particular, the operator adx is 
not invertible, so strictly speaking the expression given 
for £x is not even defined. Nonetheless, treating the 
expressions as power series, we see that 



£ 



x 



E 



(—iad 



^ (? + l)! 



(54) 



is defined and everywhere convergent. We will discuss in 
the next subsection how to explicitly calculate the action 
of £x in a convenient fashion. For now we take it as given 
that this can be done, and discuss how Theorem|21 allows 
us to change variables in Tjj SU (2 n ) . 

The discussion, of course, follows similar lines to the 
discussion in the previous subsection. We fix y S 
TtjSU(2 n ), and suppose x is a vector of Pauli co- 
ordinates for U, so U = cxp(—ix ■ a). We will find it 
convenient to use x a both to denote Pauli co-ordinates, 
and also the particular Pauli co-ordinates for U, with the 
meaning to be determined by context. Then we have 



y 



E 



~dx° 



[exp(-i(x + ty) ■ a] t=0 , 



(55) 
(56) 



where y is the vector whose entries are the natural Pauli 
co-ordinates y a . Using the BCH formula we have 



y = [exp(-ity-a)U + 0(t 2 )] j 
= [exp(-ity ■ a)U] t=0 , 



(57) 
(58) 



exp(— i&dg., 



-iadj. CT 



(59) 
(60) 



Note that the components of y may be extracted from 
this expression by multiplying both sides by some specific 
Pauli matrix a and taking the trace. 

Equations and Ip^l are general equations telling 
us how to transform from natural Pauli co-ordinates in 
TuSU(2 n ) to natural adapted co-ordinates in TuSU(2 n ). 
By applying the inverse operation we can transform 
from natural adapted co-ordinates into natural Pauli co- 
ordinates. A straightforward but somewhat lengthy cal- 
culation shows that in the case where n = 1 these results 
reduce to the results for SU (2) deduced in the previous 
section. We omit the details of this calculation. 



Explicit computation of the change of 
co-ordinates 



In the last subsection we explained how to change 
from natural Pauli co-ordinates to natural adapted co- 
ordinates in TjjSU (2 n ), through Equations (O and jSO). 
These formulas are compact, but it is not entirely evident 
how to perform an explicit calculation of this change of 
co-ordinates. In this subsection we explain how to per- 
form such calculations, and also how to do the inverse 
change, from natural adapted co-ordinates to natural 
Pauli co-ordinates. This enables, in principle, the ex- 
plicit computation of all terms in the geodesic equation, 
Equation (J25J). 

One way of performing such calculations is to expand 
£g. a in a power series in ad^.o-, as specified by Equa- 
tion (|54|l . Computations can then be carried out to a 
good approximation simply by computing the first few 
terms of the power series. Along similar lines, the in- 
verse to £g. a also has a power series expansion (see the 
discussion in Section 3.4 of H3), and so computations of 
the inverse can be carried out along similar lines. 

However, there is a useful alternate approach offering 
the possibility of exact computation, which we now de- 
scribe. In particular, we will describe a general method 
to compute £x(Y). 

There are two main difficulties facing us in the com- 
putation. The first is that it is computationally inconve- 
nient to deal with superoperators like adx- To alleviate 
this difficulty we will vectorize our expressions. Vector- 
ization is a procedure that converts operators into vec- 
tors, and superoperators into operators, to obtain equiv- 
alent expressions involving only vectors and ordinary op- 
erators. This step is not strictly necessary, but it is ex- 
tremely convenient. As the vectorization formalism is 
not entirely standard in the quantum information liter- 
ature, an introduction to this formalism is presented in 
Appendix [CI to which the reader not familiar with vec- 
torization should now turn. 
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The second, and more serious, difficulty, is that adx 
is not invertible. Our solution is to decompose Y into a 
component lying in the kernel of adx , and a component 
lying in the orthocomplement of the kernel. We then 
compute the action of Sx on each of these two spaces 
separately. This computation is also greatly facilitated 
by use of the vectorization formalism. 

We begin with a useful characterization of the kernel 
of adx, which we denote ker(adx). Recall that ker(adx) 
consists of all those matrices Z such that adx(Z) = 0. 
The following proposition gives a computationally con- 
venient description for ker(adx). 

Proposition 2. Suppose the Hermitian matrix X has 
spectral decomposition XjPj , where the Xj are the dis- 
tinct eigenvalues of X , and the Pj project onto the cor- 
responding eigenspaces. Then the operation 

V{Z)^Y. P o ZP 3 (61) 

3 

projects onto the kernel of adx ■ 

Proof: To see that V is a projector we need to show 
that it is Hermitian and satisfies V o V = V . Both of 
these facts are easily verified. 

To complete the proof we need to show that V{Z) = Z 
if and only if Z is an element of the kernel of adx- To 
prove the forward implication, suppose V{Z) = Z. Then 
Z = EjPjZPj- Th us adx(Z) = V, A ', = 0, 
as required. To prove the reverse implication, suppose 
adx(Z) — 0. Then [X,Z] = 0, and we conclude that it 
must be possible to write Z = yj ■ Zj , where Zj acts only 
within the eigenspace corresponding to the eigenvalue Xj . 
It follows that V{Zj) = Z^ and thus V{Z) = Z, as 
desired. □ 

To compute £x{Z) we first consider two special cases: 
the case when Z is an element of ker(adx), and the case 
when Z is in the orthocomplement to ker(adx). 

Case: Z E ker(adx). We see from inspection of the 
power series expansion Equation l|54|) that all but the 
first term vanishes, leaving £x(Z) = Z. 

Case: Z in the orthocomplement to ker (adx). In 
this space adx has a Moore-Penrose generalized inverse. 
It is most convenient to express £x(Z) in vectorized form 
(c.f. Equation ED) ) 

vec(£ x (Z)) = i(U*® U - I ® I)(X* ® I - I ® X'y 1 x 
vec(Z), (62) 

where the inverse operation is the Moore-Penrose gener- 
alized inverse, easily computed by any of the standard 
computer algebra packages. 

General case: We can now compute a general expres- 
sion for £x(Z) by combining these two special cases and 
our expression for V, the projector onto ker(adx). We 
have 

vec(£ x (Z)) = vec(£x(V(Z))) +vec(£ x (Q(Z))), 

(63) 



where Q = X — V projects onto the orthocomplement of 
ker(adx). Using the previous observations we have 

vec(£x(Z)) = vec(V)vec(Z) + 

i(u* ®u - i®i)(x* ®i - i®xy x x 

(I® I- vec(V))vec(Z), (64) 

where the inverse is again the Moore-Penrose generalized 
inverse. Note also that we have 

vec(P) = 53 -P/® Pj, (65) 

3 

where the Pj project onto the eigenspaces of X with dis- 
tinct eigenvalues. 

Equations l|64|) and (|65|) offer an explicit way of com- 
puting the action of £x, and thus of making the change 
of variables from natural Pauli co-ordinates to natu- 
ral adapted co-ordinates on TjjSU(2 n ). In practice, of 
course, this computation may be rather cumbersome, 
however it is in principle possible using the approach we 
have described. 

It is easy to invert Equation l|64|) . obtaining 

vec(£ x \Z)) 
= vecfP)vec(Z) 

-i(X* ® I - I ® X)(U* ®U-I® I)' 1 x 

(I ® I — vec(V))vec(Z) , (66) 

where the inverse operation is a Moore-Penrose gener- 
alized inverse. Using this expression we can explicitly 
compute the change of variables from natural adapted 
co-ordinates to natural Pauli co-ordinates on TjjSU{2 n ). 

IV. THE PAULI GEODESICS 

In this section we'll study a class of curves which 
are geodesies for each of our families of Finsler met- 
rics, .FiA, FpA, F2, and F q . Our study begins in Subsec- 
tion llV Al where we identify some isometries of our Finsler 
metrics. In Subsection IIVBI we use these isometries 
and the geodesic equation to identify a special class of 
geodesic solutions, which we call Pauli geodesies. These 
solutions are geodesies for all the local metrics we have 
defined, although their lengths may be different for the 
different local metrics. Examining these Pauli geodesies, 
we find examples of unitary operators with multiple (in- 
deed, infinitely many) Pauli geodesies passing through 
them. In Subsection IIV CI we show that the problem of 
determining the minimal length Pauli geodesic passing 
between / and a unitary operation U which is diago- 
nal in the computational basis is equivalent to solving 
an instance of the closest vector in a lattice problem 
(CVP). Also in this subsection, we show that if the min- 
imal length curve from / through U is unique then it 
must be a Pauli geodesic, and so the length of the min- 
imal Pauli geodesic will be dp(I,U). Subsection IIV Dl 
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uses the connection to CVP to argue that all but a tiny 
fraction of unitaries diagonal in the computational basis 
have exponentially long minimal Pauli geodesies. The 
section concludes in Subsection llV El with a discussion of 
the results obtained, and some caveats about their impli- 
cations. 



A. Metric isometries 

In order to understand the space of solutions to the 
geodesic equation, it is helpful to first study isometries 
of the local metric, F, which in turn are reflected in sym- 
metry properties of the geodesies. 

What do we mean by an isometry of F? Suppose h : 
M — > M is a diffeomorphism of the Finslcr manifold M 
to itself. If F is a Finsler metric on M, then we say h 
is an F -isometry if lp(s) = lp{h o s) for all curves s. It 
is clear that such an isometry preserves geodesies on the 
manifold M, i.e. h o s is a geodesic if and only if s is a 
geodesic. It is also straightforward to see that a necessary 
and sufficient condition for h to be an isometry is that 

F(s(t),[s] t )=F((hos)(t),[hos} t ) (67) 

for all curves s and for all t. Note that [h o s] t = h*[s]t, 
where /i* : T s u\M — > Thi a n\)M is the linear pushforward 
map, so this condition may be rewritten: 

F(x,y)=F(h(x),h*y) (68) 

for all (x,y) £ TM. 

For a local metric F : T SU(2 n ) -> [0, oo) the condition 
Equation (|68|l that h be an isometry is equivalent to the 
condition 

F(U,H)=F(h(U),h*H), (69) 

where /i* is a superoperator pushing forward the Hamil- 
tonian H representing the tangent at U. When F is 
right-invariant this may be replaced by the condition 

F(H) = F(KH), (70) 

where /i* is (implicitly) a function of the location U on 
SU(2 n ), and Equation l(7U|) must be true at all values of 
U. 

It will be convenient to regard /i* as a matrix written 
with respect to the a basis. For all our local metrics 
a sufficient condition for Equation l|7U|) to hold is that 
/i* be diagonal with respect to this basis, with entries 
±1. This corresponds to the condition that F(H) does 
not depend on the sign of the expansion coefficients in 
H = J2a 7 <Tf7 ' but only on their absolute values. We 
will call any right-invariant local metric with this prop- 
erty a Pauli- symmetric local metric. It is clear that 
•Pi A j FpA, i*2 and F q are all Pauli-symmetric local met- 
rics. Some of our local metrics admit larger classes of 
isometries: 



• F\&: It suffices that h* be a signed permutation, 
i.e., there is a permutation 7r of the Pauli matrices 
such that h*(a) — ±ir(a). 

• F p a ■ It suffices that h* be a block diagonal sum of 
signed permutations, where the blocks correspond 
to all those values of a for which p(wt(a)) takes the 
same value. 

• F2: It suffices that /i» be an orthogonal matrix. 

• F q : It suffices that h* be a block diagonal sum of 
orthogonal matrices, where the blocks correspond 
to all those values of a for which g(wt(cr)) takes the 
same value. 

These classes of isometry sometimes impose severe con- 
straints on the form of h. For example, the continuity of 
h and the fact that SU(2 n ) is connected imply that if /i* 
is a signed permutation for all values of U , then /i* must 
be a constant. It is not difficult to prove that this con- 
stant uniquely determines h*, so the set of such can 
be labeled by the signed permutations, of which there are 
only a finite number. Indeed, it is possible that the class 
of isometries may be even further constrained: it is not 
a priori clear that given a particular signed permutation 
there even exists an h such that h* takes on the value of 
that signed permutation everywhere. 

The problem of obtaining a complete classification of 
the isometries is an interesting problem in its own right, 
but it is not our main concern here. Rather, we will 
construct some explicit examples of isometries h realizing 
one or more of these conditions, and use those isometries 
to construct the Pauli geodesies. 

Example: adjoint action of the Pauli group. 
Suppose a is a generalized Pauli matrix. We can de- 
fine a corresponding map h a ■ SU(2 n ) — > SU(2 n ) by 
h a (U) = alia'. A straightforward calculation shows that 
h a *(H) = aHa\ so h a * is indeed diagonal with entries 
±1, and h a is an isometry of all our local metrics. 

Example: complex conjugation. Define the map 
h : SU(2 n ) -> SU{2 n ) by h(U) = U*. A calculation 
shows that h*(H) = —H*, and thus ft.* is again diagonal 
with entries ±1. It follows that the map U ^ U* is an 
isometry of all our local metrics. 

Example: adjoint action of local unitaries. Sup- 
poe W = Wi ® . . . ® W n is a local unitary opera- 
tion on n qubits. Define h w : SU(2 n ) -> SU(2 n ) by 
h w {U) = WUW^. Then hJJS) = WHW\ whence h w 
is an isometry for F2 and F q . 

Example: adjoint action of the Clifford group. 
Recall that the n-qubit Clifford group 12 consists of all 



Sometimes refe rred to as the normalizer of the Pauli group. See 
Chapter 10 of l58l for a review of th e Clifford group and the 
associated stabilizer formalism, or I5SB for much of the original 
development of this formalism and its applications in quantum 
information science. 
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those n-qubit unitary operations g having the property 
that gcrg^ is a generalized Pauli matrix whenever a is 
a generalized Pauli matrix. This group includes many 
interesting unitary operations, including the controlled- 
not, the Hadamard gate, and the generalized Pauli ma- 
trices themselves. 

Suppose g is an element of the Clifford group. We can 
define a corresponding map h g : SU(2 n ) — > SU(2 n ) via 
h g (U) = gUg^. We compute the pushforward h g * at U, 
obtaining h g *{H) = gHgK Since g is an element of the 
Clifford group, it follows that h g * is a permutation with 
respect to the a co-ordinates, and thus h g is an isometry 
of -Fia and of F 2 , but not in general of F p a or of F q . 

Example: adjoint action of the unitary group 
on SU(2 n ). Let W be an arbitrary unitary, and define 
an action h w : SU{2 n ) -> SU(2 n ) by h w (U) = WUWl 
A calculation shows that hw*(H) = WHW^, and thus 
hw is an isometry of F 2 , but is not in general an isometry 
of the other local metrics. 



B. Pauli geodesies 

The isometrics identified in the previous subsection en- 
able us to identify a large class of geodesies which we call 
Pauli geodesies. These geodesies arise as a result of the 
Pauli group isometry, and thus are geodesies for all the 
local metrics we have defined, and, indeed, of any Pauli- 
symmetric local metric. 

To construct the Pauli geodesies we begin with a simple 
proposition. 

Proposition 3. Let M be a Finsler manifold. Suppose 
h : M — > M is an isometry and s is a geodesic such that 
(a) h(s(0)) — s(Q), and (b) /i*([s]o) = [s]o- Then s = hos 
and /i*([s]t) = [s]t for all t. 

Proof: The proof is to observe that s and hos are both 
geodesies with the same starting point, h(s(0)) = s(0), 
and the same initial tangent vector /i*([s]o) — [s]q. The 
geodesic equation is a second order ordinary differential 
equation, and thus by the uniqueness of solutions to such 
equations we deduce that hos = s. It follows immediately 
that /i*([s]t) — [s]t for all t. □ 

As a simple but useful illustration of the proposition, 
suppose we have a solution U(t) to the geodesic equa- 
tion for a Pauli-symmetric Finsler metric, with initial 
tangent vector corresponding to a Hamiltonian Hq . Sup- 
pose aH o-^ = H for some generalized Pauli matrix u. 
Then Proposition |31 implies that aU{t)a^ — U(t) for all 
t. 

The construction of the Pauli geodesies is based on the 
stabilizer formalism developed by Gottesman j^j; for a 
review, see Chapter fO of |58j. In particular, we suppose 
G\, . . . , a n is a set of stabilizer generators, i.e., indepen- 
dent and commuting generalized Pauli matrices which 
generate a subgroup S of the full Pauli group. Suppose 
Ha = J2aes h^a. Then we claim that the geodesic U 
with U(0) — I and initial tangent vector corresponding 



to Ho is just U(t) = exp(-iHot), for any Pauli-symmetric 
Finsler metric. We refer to U(t) as a Pauli geodesic for 
the Finsler metric. 

The first step of the proof is to observe that the Pauli 
co-ordinates x a (t) of U{t) are identically zero, unless a S 
S. To see this, suppose a S, and choose er £ S which 
anticommutes with a. From our earlier remarks we see 
that U(t) = aU{t)a\ and thus x a (t) = for all t. We say 
that a unitary satisfying this condition is S -invariant. 

We now analyse the geodesic equation, Equation l|32|l . 
for the co-ordinates x a and with a G S. In particu- 
lar, because U(t) is guaranteed to be iS-invariant we may 
effectively regard F 2 as a function of x a and y a only for 
er e S. We have 
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(71) 



where all partial derivatives are evaluated at S'-invariant 
unitary matrices. But dF 2 /dx' J — at such an S'- 
invariant unitary matrix, since F 2 has no dependence 
on x a , by the commutativity of the elements of S. Sub- 
stituting this into the right-hand side of Equation l|7I|) 
and applying the chain rule to the left we obtain: 
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But 



dx T dy a 



0, since — 0, and thus 
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dF 2 dy T 



(73) 



Using the invertibility of the Hessian we obtain ^ = 0, 
and thus x T — c T t for some constant c T . It follows that 
the solution to the geodesic equation is 



U(t) = exp(-iH t), 



(74) 



as claimed. 

Summing up, for a Pauli-symmetric Finsler metric 
such as -Fia, Fp&, Fi or F q , when the initial Hamiltonian 
Ho is a sum over terms in a stabilizer subgroup, the cor- 
responding Pauli geodesic solution is just the exponential 
U(t) = cxp(-iH t). 



C. Minimal Pauli geodesies and the closest vector 
in a lattice problem 

In this subsection we study the minimal length Pauli 
geodesies from / to U, where U is diagonal in the com- 
putational basis. We show that for any right-invariant 
Pauli-symmetric Finsler metric this minimal length is 
equal to the solution of an instance of the closest vector 
in a lattice problem (CVP). This class of Finsler met- 
rics include all the Finsler metrics of most interest to us: 
Fia,F p &,F 2 , and F q . 
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Note that the case where U is diagonal in the computa- 
tional basis corresponds to the case where the stabilizer S 
contains exactly the products of Pauli / and Z matrices, 
e.g., for n — 2, S contains II,ZI,IZ and ZZ. Exactly 
analogous results hold for all other choices of stabilizer, 
but working with this particular stabilizer allows us to 
make use of certain standard notations and nomencla- 
ture, and so avoid the introduction of extra terminology. 

One reason for specializing to unitaries diagonal in the 
computational basis is that it includes a class of uni- 
taries of exceptional interest: those that can be written 
Uf = J2z(~ ^-Y \ z )i z \> where f(z) is a classical Boolean 
function on the n-bit input z. Kitaev's phase estimation 
algorithm 60] shows that, given a single ancilla qubit, the 
computation of Uf requires essentially the same number 
of quantum gates as computation of the function / on a 
quantum computer. Thus, bounds on the size of the cir- 
cuit required to compute Uf are of considerable interest. 

Returning to the general case of U diagonal in the com- 
putational basis, our goal in this subsection is to study 
the length of the minimal Pauli geodesic between / and 
U. Of course, the quantity of real interest to us is the 
length of the minimal geodesic between / and U , uncon- 
strained by the constraint that it be a Pauli geodesic. 
Unfortunately, we can't say when it will be true that the 
minimal length geodesic is going to be a Pauli geodesic. 
However, the following proposition gives some hopes that 
this will be the case for some unitaries of interest. 

Proposition 4. Let F be a Pauli- symmetric Finsler 
metric. Let U £ SU(2 n ) be diagonal in the computational 
basis. Suppose the minimal length geodesic s between I 
and U is unique, i.e., there is only a single curve s be- 
tween I and U with dp (I, U) = If{s). Then s must be a 
Pauli geodesic. 

For the usual model spaces of Riemannian geometry 
(the sphere, flat Euclidean space, or the hyperbolic space) 
non-unique minimal paths are quite non-generic, suggest- 
ing that the same may be true in the situations of interest 
to us 13 . 

Proof: Let a be a generalized Pauli matrix containing 
only Zs and Js. Let s be the minimal length geodesic 
between / and U. Define s(t) = as{t)<j^ . Then s has 
the same endpoints and length as s, and thus, by the 
uniqueness of the minimal geodesic, we must have s = s. 
Since this is true for all a containing only Zs and Is, it 
follows that s(t) is diagonal in the computational basis 
for all t, and thus s is a Pauli geodesic. □ 

Fixing U, which Pauli geodesies pass from / to Ul 
To answer this question, choose Hermitian H such that 
U = cxp(—iH). Let J be the set of traceless Hermi- 
tian matrices which are diagonal in the computational 



Compare, however, the counterexample in Subsection IIV El be- 
low. 



basis, and have diagonal entries which are integer multi- 
ples of 2ir. Let J € J . Then for any such J, the curve 
exp(—i(H — J)t) is a Pauli geodesic passing through U. 

This freedom to choose J actually exhausts the free- 
dom in the choice of Pauli geodesies 14 passing through U . 
To see this, suppose exp(— iHt) and exp(—iH't) are two 
Pauli geodesies passing through U at t = 1. Then we have 
exp(— iH) = exp(-iH'), whence exp(i(H' — H)) = I, 
since H and H' are both diagonal in the computa- 
tional basis, and thus commute. However, in order that 
exp(i(H' — H)) = I, we must have that J = H — H' 
is traceless (since both H and H' are), and diagonal in 
the computational basis, with entries which are integer 
multiples of 2-7r. 

It is straightforward to verify that the set J has the 
structure of a lattice, i.e., taking an integer linear combi- 
nation of elements of J produces another element of J . 
A basis for this lattice is the matrices 27r(|z)(z| — |0)(0|), 
where z ^ 0. 

The length of the Pauli geodesic exp(—i(H — J)t) be- 
tween / and U is given by F(H —J), so the length of the 
minimal Pauli geodesic through U is given by: 

mm F(H -J). (75) 

Thus the problem of finding the minimal length Pauli 
geodesic is equivalent to finding the lattice vector in J 
closest to H according to the F(-) norm on su{2 n ). This 
is the desired connection to the closest vector in a lattice 
problem. 

The connection to lattices also makes it straight- 
forward to construct arbitrarily long geodesies passing 
through a given unitary. This is true, for example, 
even in the two-qubit case. Suppose we choose U = 
cxp(-inZZ/2), and select H = ttZZ/2+2-kZI/M, where 
M is a positive integer equal to 1 modulo 4. Then 
exp(-iHt) is a Pauli geodesic which first passes through 
U at t = M. By making M sufficiently large we can 
increase the length of this geodesic without bound. 

D. Existence of exponentially long minimal Pauli 
geodesies 

In the previous subsection we showed that finding the 
minimal length Pauli geodesic from / through a diagonal 
unitary U is equivalent to solving an instance of CVP. In 
this subsection we'll use this connection to prove that for 
most such U the minimal length Pauli geodesic is expo- 
nential in length. The key is the following proposition, 
pointed out to the author by Oded Regev: 



Note that in analysing the freedom we restrict ourselves to Pauli 
geodesies exp(-iH't) for which H' is diagonal in the computa- 
tional basis. For some very non-generic U it may be that U is 
diagonal in the computational basis, yet has a Pauli geodesic 
passing through it for which H 1 is not diagonal in the computa- 
tional basis. We shall ignore this possibility. 
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Proposition 5. Let V be a d- dimensional vector space 
equipped with the standard Lebesgue measure. Let F be 
a norm on V, and let V F (r) be the Lebesgue measure of 
the unit ball of radius r associated to F. Let J be a d- 
dimensional lattice in V , and let M be a matrix whose 
columns are a lattice basis for J , so the Lebesgue measure 
of a unit cell in J is det(A/). Then if a fraction f (0 < 
/ < 1) of points in V are within a distance r of the lattice 
we must have 

/det(M) < V F {r). (76) 

Proof: Consider a large volume containing N lattice 
points. Let R be the region obtained by surrounding 
each of the N lattice points by the unit ball of radius r 
according to the norm F. The total Lebesgue measure of 
the region R is at most NV F (r). By assumption, in the 
large N limit R contains at least a fraction / of points 
in the N unit cells associated to the N lattice points 15 , 
and thus /JVdet(M) < NV F (r). Dividing by N gives 
the desired result. □ 

To apply this result, it simplifies matters 16 to vary our 
earlier approach slightly, moving from SU(2 n ) to U(2 n ), 
and defining the local metrics F\, -Fia, F p , F p &, F2, F q 
analogously to before, but now with a contribution from 
the a = I® n term. It is not difficult to show that when 
U is in SU (2 n ) it has the same minimal curves regardless 
of whether we use the formulation of the local metric in 
SU{2 n ) or 17(2"). 

In this formulation, Pauli geodesies exist for all our 
Finsler metrics, and the minimal length Pauli geodesic 
is found by minimizing F(H — J), where J is in the lat- 
tice spanned by matrices of the form 27r|z)(z|, which may 
be rewritten in the more convenient form 2ir <8>™ =1 {I + 
ZjZj)/2 n . Arranged into columns, the corresponding ma- 
trix of lattice basis vectors has the form 2irH® n /2 n / 2 , 
where H is the usual 2x2 Hadamard matrix. Thus the 
conclusion of Proposition [S] is that 

/x(J^) 2 <V F {r). (77) 

Let us analyse what this allows us to conclude about the 
fraction / of points within a distance r of the lattice, for 
each of our choices of Finsler metric. 

Case: Fia- As A — > the unit sphere has volume 
VfiaW ™> (2r) 2 "/(2 n !). Applying Stirling's formula, 
Equation i|77|) reduces to 

r > ^ 2 "/ 2 / 1 / 2 " (78) 
e 

in the A — > limit. In consequence, unless r is exponen- 
tially large, at most a doubly exponentially small fraction 



15 We neglect finite-size corrections of order sublinear in N. 

16 Analogous results hold for SU(2 n ), but the calculations are more 
complicated, due to the more complex lattice basis. 



of diagonal unitary operators will have minimal Pauli 
geodesies of length r or less. 

Case: F p a- Obviously, the minimal length Pauli 
geodesies for F p & are longer than those for Fia, provided 
the penalty function satisfies p(J) > 1. Thus, unless r is 
allowed to be exponentially large, at most a doubly ex- 
ponentially small fraction of diagonal unitaries will have 
minimal Pauli geodesies of length r or less. 

Case: F2. Based on our previous results we expect 
constant size minimal length Pauli geodesies for F2 . This 
expectation is not disappointed. The volume formula in 
this case is V F2 (r) = (0Fr) 2 "/(2"/2)!. Applying Stir- 
ling's formula, and setting / = 1, Equation (|77jl reduces 
to: 




Case: F q . The volume elements is given by 

where the product is taken over all a containing only / 
and Z terms. Applying Stirling's formula, Equation l|77|) 
reduces to: 




We see that provided the penalty function q is chosen ap- 
propriately, all but a doubly exponentially small fraction 
of diagonal unitaries will have minimal Pauli geodesies 
which are exponentially long. Such a choice is provided, 
for example, by Equation H28[). with k exponentially 
large. 

E. Discussion 

In the past few subsections we've explained the con- 
struction of the Pauli geodesies, connected the minimal 
length Pauli geodesic to the solution of an instance of 
CVP, shown that the minimal length Pauli geodesic is 
actually the minimal length curve, provided that curve 
is unique, and proved that most diagonal unitaries have 
exponential length minimal Pauli geodesies. This subsec- 
tion injects some words of warning into this otherwise en- 
couraging situation, explaining some significant caveats 
to our results. 



1. On the uniqueness of minimal curves 

Based on the standard model spaces of Riemannian 
geometry (the sphere, Euclidean flat space, or the hy- 
perbolic space), it seems plausible that the minimal 
geodesies between / and a diagonal unitary U are gener- 
ically unique, and thus Pauli geodesies. However, this 
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may not always be the case for F p , as the following ex- 
ample shows. 

Consider the Boolean function f(z) — f(z\, . . . , z„) = 
Z\Zi . . .z n , i.e., the AND of the n bits Z\, . . . , z n , and the 
associated unitary transformation U\z) = {— l)^ z )|z). 
Using the connection to CVP, it is easy to verify that 
for all of our Finsler metrics F the minimal length Pauli 
geodesic is nF(\l, . . . , 1) (1, . . . , 1|). 

Consider the case of F p & , with the penalty function p 
chosen as q was in Equation J5SJ. A calculation shows 
that as A — ► the minimal Pauli geodesic has length: 

A 2 + n + n 2 / A , s 

n y 2^+1 ~ ) ' (82) 

When k is large, this is dominated by the term irk. 

By contrast, the results of Barenco et al 28] show that 
there is a quantum circuit for U containing 0(n 2 ) one- 
and two-qubit gates. It follows that oIf (I,U) < cn 2 , 
for some constant c. To reconcile this result with Equa- 
tion (|82l) we see that when k is large, the minimal 
geodesic between I and U must not be a Pauli geodesic, 
and therefore cannot be unique. This example — which 
is easily modified to give other examples — suggests that 
the applicability of PropositionQ]may be limited, at least 
for some choices of Finsler metric, and highlights the need 
to develop more tools for the analysis of the distance 
function <1f(I, U). 



2. Classical simulations 

We have argued earlier in the paper that it is at least 
plausible that local metrics such as F± , F p and F q give 
rise to distance functions dp{I,U) which are polynomi- 
ally equivalent to mg{U). Suppose, for the sake of ar- 
gument, that we can find a Finsler metric F with this 
property. Suppose furthermore that for a generic uni- 
tary diagonal in the computational basis, the minimal 
length curve is unique. If this is the case, then for such 
unitaries there is circuit containing only gates diagonal 
in the computational basis, and with a size polynomially 
equivalent to the minimal number of gates required to 
generate U. 

This conclusion would be rather surprising, as such cir- 
cuits can be simulated with at most a polynomial over- 
head in the classical circuit model, and it would therefore 
conflict with the general belief that quantum computers 
offer a substantial complexity advantage over classical 
computers. 

Of course, there are many potential loopholes in this 
argument: it makes use of many unstated assumptions 
(no use of ancillas, no approximation, no uniformity re- 
quirement) as well as several steps that, while plausible, 
could easily turn out to be wrong. I can not at present 
resolve which of the many possibilities is correct, but it 
suggests many interesting directions for further research. 



V. CONCLUSION 

In this paper we have proposed a geometric approach 
to the problem of proving lower bounds on the number of 
quantum gates required to synthesize a desired unitary 
operation. In particular, we have shown that such lower 
bounds may be provided by the length of the minimal 
geodesies of certain Finsler metric structures on SU{2 n ). 

Our main progress in understanding the geodesic struc- 
ture for these Finsler metrics are the results: (1) a 
method for computing the geodesic equation explicitly, 
thus enabling numerical investigations; (2) the construc- 
tion of a large class of geodesic solutions, which we call 
Pauli geodesies, passing from the identity / through any 
unitary U which is diagonal in the computational ba- 
sis; (3) the demonstration of an equivalence between the 
problem of finding the minimum length Pauli geodesic 
between I and U, and the closest vector in a lattice prob- 
lem (CVP); (4) a proof that when there is a unique mini- 
mal length geodesic between I and U, then that geodesic 
must be a Pauli geodesic; and (5) a proof that all but a 
very small fraction of diagonal unitaries U have minimal 
length Pauli geodesies which are of exponential length. 

To make further progress it will be necessary to obtain 
more insight into the space of geodesies associated to 
each of our Finsler metrics. Of course, understanding 
the space of geodesies associated to a Finsler metric is 
a difficult problem to solve, even for relatively simple 
Riemannian spaces, and much of the ongoing work in 
Riemannian and Finsler geometry is motivated by this 
problem. 

Questions of particular interest include: (a) what are 
the geodesies; (b) how long are the geodesies, and can we 
find the minimal length geodesies, or at least a bound on 
their length; (c) do there exist exponentially long min- 
imal length geodesies, and if so, can we construct some 
explicit examples, and hence explicit examples of uni- 
tary operations requiring an exponential number of gates; 
and (d) for which (if any) local metric F is the minimal 
path length (If {I, U) polynomially equivalent to the size 
mg(U) of the minimal quantum circuit synthesizing U? 

Broadening the scope, the results of this paper do not 
yet address many issues of interest in quantum compu- 
tational complexity. In particular, our results are con- 
strained entirely to exact and non-uniform implementa- 
tions of a unitary operation, while the subject of most 
interest for quantum computational complexity is ap- 
proximate and uniform implementations. Also from the 
point of view of computational complexity, it is desir- 
able to obtain strong results about the impact working 
qubits (i.e., ancilla) have on minimal path lengths. Fi- 
nally, it is tempting to speculate on whether a geometric 
approach along the lines sketched here could ever be pow- 
erful enough to resolve complexity class class separations. 
In this vein, it should be noted that results such as the 
well-known no-go theorem of Razborov and Rudich [6lJ 
(see also js^l ) suggest that to apply the geometric ap- 
proach to such separations would require deep insights 



24 



into very specific computational problems. 

On the flip side, one might ask if these techniques can 
be of any use in quantum algorithm design, either for 
recovering existing algorithms, or perhaps in the design 
of new algorithms. In particular, if we can find a local 
metric F such that dp(I,U) is polynomially equivalent 
to mg(U), then quantum circuit design may be viewed 
in terms of the construction of short geodesies between 
/ and the desired unitary operation, i.e., in terms of the 
solution of a two-point boundary value problem. In a 
similar vein, application of these ieas to oracle problems 
and quantum communication complexity may be possi- 
ble. 

Further afield still, one may ask whether a similar ap- 
proach based on Finsler geometry might be taken in the 
study of classical computing. A priori this idea does 
not appear particularly promising, as classical computing 
models are usually formulated in a discrete fashion not 
amenable to study using the calculus of variations. How- 
ever, if one reformulates those models using the theory 
of continuous time, discrete state space Markov chains, 
I believe it may be possible to apply similar techniques, 
perhaps along the lines which have been explored in the 
theory of optimal stochastic control. 



APPENDIX A: APPROXIMATING LOCAL 
METRICS WITH FINSLER METRICS 

In this appendix we explain how the local metrics F\ 
and F p , which lack the smoothness and strong convexity 
properties required by Finsler metrics, can be approxi- 
mated arbitrarily well by Finsler metrics. 

To begin, let's formalize the notion of approximating 
one local metric by another. Let F,F : TM — » [0, oo) 
be two local metrics on the manifold M. We say F is 
metrically equivalent to F if there exist positive constants 
A and B such that 

AF(x,y) <F(x,y) <BF(x,y) (Al) 

for all (x,y) € TM. A little thought shows that this 
definition, which appears asymmetric in F and F, is ac- 
tually symmetric. It is also easy to see that if F and F 
are metrically equivalent then they satisfy 

Adp(xi,X2) < dp(xi,x 2 ) < B d F (xi,x 2 ), (A2) 

for all x\ and x 2 £ M. 

Our goal in this appendix is to construct Finsler met- 
rics FiA and F p a which are metrically equivalent to Fx 
and F p , respectively, for sufficiently small values of the 
positive parameter A. Furthermore, as A — > it turns 
out that A — ► 1 and B — > 1 for both classes of metrics, 
so Equation (|A2|) tells us that the notion of length given 
by Fia and F p & approaches that given by F\ and F p . 

Our strategy in constructing the approximating Finsler 
metrics is to first study the problem of finding Minkowski 



norms N<\ which approximate a given norm 17 N on R . 
Once we understand this problem it is straightforward to 
construct the appropriate Finsler metrics. 

Constructing Minkowski norms with suitable proper- 
ties does not seem easy to do directly, in large part be- 
cause of the positive homogeneity condition for norms. 
We will take a more indirect approach to the defini- 
tion, defining norms in terms of their indicatrices, i.e., 
their unit spheres. This material, described in Subsec- 
tion lA ll is well-known in the Finsler geometry literature 
(see, e.g. 0), and our discussion merely outlines the 
major facts. The exception is Proposition0 which seems 
to be a well-known folklore result, but which I have not 
found explicitly in the prior literature. Consequently, a 
proof is included. Subsection IA 21 uses this background 
to construct the desired classes of approximating Finsler 
metrics. 



1. The indicatrix and the implicit definition of 
Minkowski norms 

The following proposition gives a convenient way of 
defining smooth norms in terms of a function g : R d — > R 
which is not necessarily positively homogeneous. To state 
the proposition we define S g to be the set of points y 
such that g(y) — 1. We will use g to construct a norm 
N g whose indicatrix is S g . 

Proposition 6. Suppose g : M. d — > M is smooth, convex, 
satisfies g(0) < 1, and is such that S g is compact. Then 
the function N g defined by the equations 

N g (0) = 0; (A3) 
9 (^ N V ( y y j = Vrovided y ^ 0. (A4) 

exists, is uniquely defined, is smooth away from the ori- 
gin, and is a norm with indicatrix S g . 

Proof: This is easily proved, and a well-known result 
of Finsler geometry. See, for example, |45| for an outline 
of the proof. The only non-trivial step is an application of 
the implicit function theorem (see, e.g., Chapter 7 of |43j 1 
to the implicit definition (|A4|) of N g , in order to obtain 
the smoothness condition for N g . □ 

When does N g obey the strong convexity constraint? 
The following proposition gives a simple criterion for N g 
to be strongly convex. 

Proposition 7. Suppose g : R d -> R is such that the 

matrix G with entries ( d Jjg y k ) is strictly positive for 

any y ^ 0. Then the norm N g defined by Equations L4.Sj) 
and IA4\) is strongly convex. 



Note that in keeping with standard usage in Finsler geometry we 
only require norms to be positively homogeneous, not homoge- 
neous, as is usually stipulated in other contexts. 
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Proof: To simplify notation we write N — N g . We In consequence, the Hessian matrix has the form: 



begin by differentiating the implicit definition Equa- 
tion l|A4(l with respect to y 3 , obtaining 



KM 



N(y)g,M 



(A5) 



where j denotes partial differentiation with respect to y 3 , 
y = y/N(y), and denotes the usual gradient operator, 
evaluated at y. Differentiating again, we obtain: 



N jk (y) 



9,jk(y) g,j(y )g,k(y) E Ira 9,lm (y)yy 
Vy.g • y 



(v^-y) 3 
Ez (9,j (v)9,ki (y) + g,k (y)g,ji (y))y l 



(v s5 • y? 



• (A6) 



We also obtain 



N%(y) = 2N(y)N jk (y) + 2N tj ( y )N k (y). (A7) 
Combining these results we obtain the Hessian: 



H jk (y) 

Ng, jk -/Vffjff.fc 
Vg-y (Vg-y) 3 

N T,i(g,jg,ki + g,k9s)y l 
(Vff • y) 2 



\ Ira 



i m y l y m 



NVg-y 



(A8) 



where, to simplify notation, it is implicit that all deriva- 
tives of g are evaluated at y, and N is evaluated at y. 

Examining Equation i|A8() . we see that the contribu- 
tion from the first term on the right-hand side is strictly 
positive, since both TV and Vg-y are strictly positive, and 
g t jk is strictly positive, by assumption. The contribution 
from the second term is positive, since it is a positive 
scalar multiple of the positive matrix with components 
gjg,k- Thus the sum of the first two terms is strictly 
positive. The final term of Equation (| A8|) is more prob- 
lematic, due to the presence of the minus sign. 

The resolution is to make a linear change of variables 
which simplifies the Hessian. In particular, we make a 
linear change of variables so that y — (a, 0, 0, . . . , 0), and 
V<? = (/3, 0, 0, . . . , 0). It is not difficult to see that such 
a linear change of variables is always possible, and more- 
over does not affect whether or not the Hessian is strictly 
positive. It does, however, make the analysis simpler. In 
particular, observe that by homogeneity we have: 



7V 2 (a,0,...,0) 
7Vi(a,0,...,0) 

iV 2 u (a,0,...,0) 



a 2 /V(I,0,...,0) (A9) 
2aiV(l,0,...,0) (A10) 
2/V(I,0,...,0). (All) 



Observe also that for j = 2, . . . , d we have 
N 2 (a, 0, . . . , 0) = 0, since \7 y g = ((3, 0, . . . , 0). The ho- 
mogeneity of N then implies that N 2 (a, 0, . . . , 0) = for 
all a. and thus 



/V 2 lj (a,0,...,0) = /V 2 1 (a,0,...,0) 



0. 



(AI2) 



iV(l,0, 







,0) 



(A13) 



where the Hj k are for j,k = 2,...,d. But for such values 
of j and k we have gj — g jk — and thus by Equa- 
tion i|A8(l we have Hj k = Ngjk/^g ■ y, whence the Hes- 
sian matrix has the form: 



JV(1,0,...,0) 

Ng, jk /Vg ■ y 



(AI4) 



The strict positivity of this matrix now follows from our 
assumption that the matrix whose entries are the g t j k is 
strictly positive, and the fact that any submatrix of a 
strictly positive matrix is strictly positive. □ 
We can sum up the results of the last two propositions 
in a single theorem. 

Theorem 4. Suppose g : M. d — > R is smooth, convex, 
satisfies g(0) < 1, is such that S g is compact, and the 

matrix G with entries ( g^ff^x J * s strictly positive for any 
i/^0. Then the function N g defined by the equations 



N g (0) 

y 



N g (y) 



0; (A15) 
1, provided y =/= (A16) 



exists, is uniquely defined, and is a Minkowski norm with 
indicatrix S n . 



2. Constructing the approximating Finsler metrics 

We now have all the tools in place to construct the 
desired Finsler metrics. In particular, we now explicitly 
construct F p a ■ The family of Finsler metrics F\a follow 
as the special case where p(j) = 1 for all j. 

Consider first the function of a single variable g(y) = 
v / A 2 + y 2 . This is a smooth and strictly convex function, 
but as A — > it approaches \y\. This suggests defining 
(y is now a (4™ — l)-dimensional vector): 



g P A(y) = $>(wt(£r))V^2+W- (AI7) 



Provided A is sufficiently small, it is easy to verify that 
<7 P A is smooth, convex, satisfies g(0) < 1, and is such that 
S g is compact. A calculation shows that 



dg P A _ p(wt(a)A 2 8 



dy°dy T (A 2 + (y CT ) 2 ) 3 / 2 ' 



(AI8) 



which clearly specifies a strictly positive matrix. Thus 
g p A induces a Minkowski norm N p a = N g ^ . 

It is intuitively clear that N p a tends to the norm 
N p (y) = EtrP( w t( cr ))l?/ CT | as A goes to zero. We can 
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make this intuition quantitative as follows. Define P = 
^ .p(wt(a)) and observe that 

N P (y) < g pA (y) < N p (y) + PA, (A19) 

where the first inequality follows from the fact that \y\ < 
y A 2 + y 2 , and the second inequality follows from the 
fact that -\/A 2 + y 2 < \y\ + A. These inequalities imply: 



N pA (y) 



< 



9pA 



N pA (y) 



< AT, 



y 



Observing that g pA (y / 'N pA (y)) 
N pA (y) , and rearranging gives 



N p (y) < N pA (y) < - 



= 1: 



N P (y) 

PA' 



" 1 N pA (y), 
PA. (A20) 

multiplying by 
(A21) 



Thus provided A < 1/P we see that N p (y) ps N pA (y). 
Summing up, we have defined g pA : IR 4 _1 — > K 

by g pA (y) = J2*p(^(v))V A * + Provided A < 
1/P, where P = p(wt(er)), there exists a unique func- 
tion N pA : M 4 "- 1 -> K defined by A pA (0) = and for all 
other y by 



(A22) 



This function N pA is a Minkowski norm which approxi- 



mates N p well in the sense that 



N p (y) < N pA (y) < 



(A23) 



We use N pA to define a Finsler metric F pA on i'C/ (2™), 



via 



F pA (U,y) = A pA ( 7 ), 



(A24) 



where 7 is the (4™ — l)-dimensional vector whose com- 
ponents are the natural [/-adapted co-ordinates for y <E 
Tjj SU (2 n ) . By construction this is a family of Minkowski 
norms on SU{2 n ). To see that F pA is Finsler, we need 
only prove that it is a smooth function of U . This is intu- 
itively clear. A rigorous proof follows from the results of 
Section ITTT1 which show how to explicitly calculate F pA . 
Note that F pA approximates F p well in the sense that: 



F p (U,y) < F pA (U,y) < 



F P (U,y) 
1-PA' 



(A25) 



Using results of Ghomi |63( on smoothing of convex 
polytopes it is possible to extend the approximation de- 
scribed here to show that any right-invariant local metric 
can be approximated arbitrarily well by a right-invariant 
Finsler metric. Ghomi's results even imply that any sym- 
metries in the local metric can be retained by the ap- 
proximating Finsler metric. I expect that it is possible 
to extend Ghomi's results to approximation of arbitrary 



local metrics by Finsler metrics, but have not verified this 
assertion. The main disadvantage of Ghomi's construc- 
tions — a very substantial disadvantage from our point 
of view, and the reason they are not used in this paper — 
is that they involve substantially more complex computa- 
tions than in the approach we have used to approximate 
Pi and F p . 



APPENDIX B: PROOF OF THEOREM [2] 

To prove Theorem [21 we make use of the following 
lemma: 

Lemma 1. The equation X+XxA = B, where A, B and 
X are all three-dimensional real vectors, has the unique 
solution 



X 



1 



\A\\ 



B + AA ■ B + A x B 



(Bl) 



Proof of Lemma ^ This solution is easily verified 
by hand or using any of the standard computer algebra 
packages. □ 

Proof of Theorem [2j Fixing y it is clear that some y 
satisfying Equation (|44ll must exist. All that we have to 
do is verify that y has the form specified in Equation 14till . 
To do this we simply compare the order t terms on the 
left- and right-hand sides of Equation (14411 . Beginning 
with the right-hand side we see that the term of order t 



—it(y ■ a) exp(— ix ■ a) = 

-t[sm{\\x\\)x -yl + i (cos(||x \\)y + sm(\\x\\)y x x) ■ a}. 

(B2) 

To compute the terms of order t obtained from the left- 
hand side of Equation (I44|) it helps to define z = x + ty. 
Simple calculations show that the following relationships 
hold, all to first order in t: 

\\z\\ = \\x\\ +tx-y (B3) 
cos(||z||) = cos(||f||) - tsin(||x||)x • y (B4) 
sin(||z||) = sin(||x||) + tcos(||f||)x • y (B5) 

t _ , n . 

z = x+ 

where y± = y — x • y x is the component of y orthogonal 
to x. Expanding the left-hand-side of Equation 1441) out 
gives 



cos(||z||)7 — i sin(||z||)z • a 
(cos(||x||) — £sin(||x||)x • y) I 

—i (sin(||x||) + tcos(||x||)x • y) ( x + Tr=^y 



(B7) 



(B8) 
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It follows that the term of order t on the left-hand side 
of Equation fHfl is 

-t[sbx(\\x\\)x-yl 

+i (cos(||x||)x ■ yx + smc(||x||)y_i) • a]. (B9) 



Comparing the terms in Equations l|B2|l and (|B9|) we 

obtain two equations: 

x ■ y = x ■ y (BIO) 
cos(||x|| )x ■ y x + sinc(||x|| )y± 
= cos(||x||)y + sin(||x||)y x x. (Bll) 

We will use these equations to solve for y in terms of y, 
and vice versa. Let us first express y in terms of y. To 
do this it helps to note that y± = y — x ■ yx — y — x ■ 
yx, by Equation IjBlOfl . Substituting this expression for 
y± into Equation (|B11I) . multiplying by l/sinc(||x||) and 
simplifying we obtain 



V = y\\ + cot(||x||)yj. + y X x, 



(B12) 



where m = x ■ y x is the component of y parallel to x, 
and y± = y — is the component of y orthogonal to x. 
This is the desired expression for y in terms of x and y. 

To obtain y in terms of y and x, we again start from 
Equation HBlOjl and (|B11|) . Multiplying Equation (|Bllll 
by 1/ cos(||x|j) we see it is equivalent to 

/m^iin- ~ . tan(||x||) _ ,_, „. 

y + t&n(\\x\\)y x x = x ■ y x + g "V - (B13) 

Applying Lemma ^ and simplifying the resulting expres- 
sion we obtain 



y 



+ sinc(2||x||)yj_ + sinc 2 (||x||)x x y±, (B14) 



where y\\ = i-y x is the component of y in the x direction. 
This is the desired expression for y in terms of x and y. 

□ 



APPENDIX C: VECTORIZING MATRIX 
EQUATIONS 

In this appendix we give a brief introduction to the vec- 
torizing technique, which can be used to convert matrix 
equations into equivalent vector equations. The treat- 
ment is based on [64| . which is, in turn, based on material 



in Chapter 4 of Horn and Johnson [6£ 

The vectorizing technique is based on a mathematical 
operation known as the vec operation, which may be ap- 
plied to either a matrix or a superoperator. When vec 
is applied to a matrix, it produces as output the vec- 
tor formed by stacking all the columns of the matrix up 
on top of one another. More formally, let M m „ denote 
the space ofmxn complex matrices. Let A S M m , n . 
Then we define vec(A) to be the mn-dimensional vector 



formed by stacking all the columns of A up on top of one 
another. For example, we have: 



.4 



a b 
c d 



vec(A) = 



(CI) 



We call vec(A) the vectorized form of the matrix A. 

The operation vec has many useful properties, and we 
note only a few here (see [64ll65j for more). In particular, 
if A, B € M m ^ n then vec(A) — vec(B) if and only if A = 
B. Furthermore, for every mn-dimensional vector v there 
exists a unique matrix M € M m ^ n such that vec(M) = v. 
We will write M = unvec(w), and speak of unvectorizing 
v to obtain M. Note that for this operation to be well- 
defined we need to specify m and n. 

Why define vec? The answer is that it provides an alge- 
braically and computationally convenient way of making 
explicit the structure of M mj „ as a vector space. 

The key algebraic fact about vec can be understood 
physically as a connection with maximally entangled 
states. Let A £ M m , n , and let quantum systems Q\ and 
Qi both have dimension n. Define an (unnormalized) 
maximally entangled state of Q1Q2 by 



|M25„) = 5>>li> 



(C2) 



where the \j) are fixed orthonormal bases for systems Q\ 
and Q27 respectively. (We won't bother to distinguish 
the two bases notationally, although they are, of course, 
distinct bases.) Regarding the matrix A as being defined 
in the basis \j) for Q2, we have the identity 



vec(A) = (I n ® A)\ME„ 



(C3) 



where /„ is the n x n identity matrix. We will omit 
the subscript n when its value is clear from context. 
To prove Equation l|C3|l note that by linearity it suf- 
fices to prove the identity when A — \j)(k\. The proof 
is completed by verifying that vec(\j)(k\) = \k)\j) and 
(I®\j)(k\)\ME n ) = \k)\j). 

The identity Equation l|C3|) has an extremely use- 
ful generalization, which Horn and Johnson ascribe to 
Roth [6(| . The proof is straightforward algebraic manip- 
ulation, and thus is omitted. 



Lemma 2 (Roth's lemma). 

M m<n , C € Mn tP> we have 



When A £ M t , mi B e 



vec{ABC) = (C T A)vec(B). 



(C4) 



Roth's lemma is extremely helpful in the analysis of 
linear matrix equations, such as • AjXBj = C. From 
Roth's lemma, we see that this is equivalent to the equa- 
tion 



vec(C), 



(C5) 
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which may be solved using standard techniques. 

This discussion suggests defining a vectorized form for 
a superoperator. In particular, given a superoperator 
£(•), we define a vectorized form of C as follows. First, 
note that C can always be written in the form C(X) = 



for some set of matrices Aj 
the vectorized form of C is defined by 



vec(£) = J2 B J ® A i- 



and Bj. Then 



(C6) 



It is not difficult to show that vec(£) defined in this way 
is unique, i.e., it does not depend on the particular rep- 
resentation in terms of a set of Aj and Bj operators. By 
Roth's lemma we have 



vec(£)vec(AT) = vec(£(X)). 



(C7) 



With these definitions we see that the vectorized forms 
of the superoperators X and adx are given by 



vec(J) 
vec(adx) 



I® I (C8) 
I®X-X*®I, (C9) 



where in the second line we assumed that X is Hermitian, 
and thus X T = X*. 

The vec operation for superoperators has all the alge- 
braic properties one would expect. It is linear in £, and 



a homomorpishm, i.e., vec(£i o £ 2 ) = vec(£i)vec(£2)- 
That is, vec converts composition of linear superopera- 
tors into matrix multiplication. As a consequence, we see 
that if f(x) = v; .f .,-J. then vec(/(£)) = /(vec(L)). 
Using this result, followed by Equation i|C9(l . we deduce 
that 



vec(exp(— iadx)) — U*®U, 



(CIO) 



where U = exp(— iX). It follows that the operation Ex 
defined in Subsection IIII El has vectorized form: 



vec(£x) = 



U* ®U — I ® I 
® X - X* ® I)' 



(Cll) 
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